(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
1 August 2002 (01.08.2002) 




PCT 



(10) International Publication Number 

WO 02/059271 A2 



(51) International Patent Classification 7 : CI2N 

(21) International Application Number: PCTA JS02/02 1 76 

(22) International Filing Date: 25 January 2002 (25.01.2002) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

60/263,757 
60/286,090 
60/292,517 



25 January 2001 (25.01.2001) US 
25 April 200 1 (25.04.2001 ) US 
23 May 2001 (23.05.2001) ' US 



(71) Applicant (for all designated Slates except US) : GENE 
LOGIC, INC. IIJS/USI; 708 Quince Orchard Road, 
Gaithersburg, MD 20878 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): ORR, Michael, S. 
[US/US1; c/o GENE LOGIC, INC., 708 Quince Orchard 
Road, Gaithersburg, MD 20878 (US). NATION, Michele 



(US/USJ; c/o GENE LOGIC. INC., 708 Quince Orchard 
Road, Gaithersburg, MD 20878 (US). DIGGANS, James, 
C [US/USJ; c/o GliNli LOGIC. INC., 708 Quince Or- 
chard Road, Gaithersburg. MD 20878 (USV ZENG, Wen 
[CN/US]; c/o GENE LOGIC, INC., 708 Quince Orchard 
Road, Gaithersburg, MD 20878 (US). 

(74) Agent: MORGAN, LEWIS & BOCKIUS LLP; TUS- 
CAN, Michael S., WlilMAR, Elizabeth C et al., 111! 
Pennsylvania Avenue, N.W., Washington. DC 20004 (US). 

(81) Designated States (national): AE, AG. AL, AM, AT. AU, 

AZ, BA, BB. BG, BR, BY, BZ, CA, CIL CN, CO, CR, CU t 
CZ, DE, DK. DM, DZ, EC EE, ES, FL GB, GD, GE, Gil, 
GM, MR, HU, ID, 1L, IN, IS, JP KE, KG. KP. KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA. MD, MG. MK, MN, MW, 
MX, MZ, NO, NZ, OM 5 PH : PL, PL RO, RU, SD, SE, SG, 
SI, SK, St, TJ, TM. I N, TR ; 'IT TZ. UA, UG, US. UZ, 
VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GM, GM, 
Kli, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY. KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, CIL CY, DK, DK, ES, FI, FR, 

[Continued on next page] 



(S4) Title: GENE EXPRESSION PROFILES IN BREAST TISSUE 



< 



ON 
I/) 

r5 



- 33 Tissue Samples 



PCA Mapped Dala n* sprsl f78 4X} f " 



(57) Abstract: The present invention results from the examination of tissue from breast carcinomas to identify genes differentially 
expressed between tumor biopsies and normal tissue. The invention includes diagnostic and screening methods using these genes as 
well as solid supports comprising oligonucleotide arrays that arc complementary to or hybridize to the differentially expressed genes. 



WO 02/059271 A2 lllIUHnDlllllHlinilllllllllll 



GB, GR, IK, IT, UJ ? MC, Nl„ PT SR, TR), OAP1 paten! 
(BF, BJ, CF, CG, CI. CM, OA, GN, GQ, GW ; ML, MR, 
NH, SN, TO, TG). 



with sequence listing part of description published sepa- 
rately in electronic form and available upon request from 
the International Bureau 



Published: For two-letter codes and other abbreviations, refer to the "Guid- 

— without international search report and to be republished ance Notes on Codes and Abbreviations " appearing at the begin- 
upon receipt of that report ning of each regular issue of the PCT Gazette. 



WO 02/059271 PCT/US02/02176 
GENE EXPRESSION PROFILES IN BREAST TISSUE 



INVENTORS: Michael S. Orr, Michele Nation, J.C Diggans and Wen Zeng 

5 RELATED APPLICATIONS 

This application claims the priority of U.S. Provisional Application Nos. 
60/263,757, filed January 25, 2001, 60/286,090, filed April 25, 2001, and 60/292,517, filed 
May 23, 2001 , all of which are herein incorporated by reference in their entirety. 

10 BACKGROUND OF THE INVENTION 

One of the most pressing health issues today is breast cancer. In the industrial 
world, about one woman in every nine can expect to develop breast cancer in her lifetime. 
In the United States, it is the most common cancer amongst women, with an annual 
incidence of about 1 75,000 new cases and nearly 50,000 deaths. Despite an ongoing 

15 improvement in our understanding of the disease, breast cancer has remained resistant to 
medical intervention. Most clinical initiatives are focused on early diagnosis, followed by 
conventional forms of intervention, particularly surgery and chemotherapy. Such 
interventions are of limited success, particularly in patients where the tumor has undergone 
metastasis. There is a pressing need to improve the arsenal of therapies available to provide 

20 more precise and more effective treatment in a less invasive way. A promising area for the 
development of new modalities has emerged from recent understanding of the genetics of 
cancer. 

One model used to characterize breast carcinogenesis asserts that normal cells 
undergo a multi-step process that broadly includes the steps of hyperplasia, pre-malignant 

25 change and in situ carcinoma. Multiple factors lead to atypical cell proliferation followed 
by carcinoma in situ. Carcinoma in situ is characterized as either ductal or lobular in form 
with the majority of invasive carcinomas being classified as ductal (85-95%). Among the 
ductal carcinomas, 15-20% encompass tubular, medullary, mucinous, papillary, adenoid, 
cystic, metaplastic, apocrine, squamous, secretory, lipid-rich, and cystic hypersecretory 

30 while the remaining ductal carcinomas are not specified. 

To date, researchers have been able to identify a few genetic alterations believed to 
underlie tumor development. These genetic alterations include amplification of oncogenes 
and mutations that result in the loss of tumor suppressor genes. Tumor suppressor genes are 
genes that, in their wild-type alleles, express proteins that suppress abnormal cellular 
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proliferation. When the gene coding for a tumor suppressor protein is mutated or deleted, 
the resulting mutant protein or the complete lack of tumor suppressor protein expression 
may fail to correctly regulate cellular proliferation, and abnormal proliferation may take 
place, particularly if there is already existing damage to the cellular regulatory mechanism. 

5 A number of well-studied human tumors and tumor cell lines have missing or non- 
functional tumor suppressor genes. Examples of tumor suppressor genes include, but are 
not limited to, the retinoblastoma susceptibility gene or RB gene, the p53 gene, the deletion 
in colon carcinoma (DCC) gene and the neurofibromatosis type 1 (NF-1) tumor suppressor 
gene (Weinberg, Science 254,1 138-1 146 (1991)). Loss of function or inactivation of tumor 

10 suppressor genes may play a central role in the initiation and/or progression of a significant 
number of human cancers. 

Classification of heterogeneous populations of tumor types is a daunting task; yet, 
studies utilizing gene expression patterns to identify subtypes of cancer have produced 
initial results (see Perou, C. M. etai, Proc Natl Acad Sci USA 96, 9212-9217 (1999), 

15 Golub, T. R. et al., Science 286, 531-7 (1999), Alizadeh, A. A. et al., Nature 403, 503-1 1 
(2000), Alon, U. et al Proc Natl Acad Sci USA 96, 6745-50 (1999) and Bittner, M. et al, 
Nature 406, 536-40 (2000)). For example, molecular classification of B-cell lymphoma by 
gene expression profiling elucidated clinically distinct diffuse large-B-cell lymphoma 
subgroups (see Alizadeh supra). Stratification of patients based on their distinctive gene 

20 expression profiles may allow researchers to precisely group similar patient populations for 
evaluating chemotherapeutic agents. The more homogenous population of patients 
decreases the variability of patient-to-patient responses leading to the development of agents 
capable of eradicating specific subtypes of cancers previously unknown using standard 
classification techniques. 

25 A study by Martin et al. {Cancer Res 60, 2232-8 (2000)) used a custom microarray 

composed of 124 genes discovered by differential display associated with either normal 
breast epithelial cells or from the MDA-MB-435 malignant breast tumor cell line. Using 
the custom microarray, researchers examined the relationship between expression patterns 
discovered by clustering a number of genes with clinical stages of breast cancer, indicating 

30 that gene expression patterns were capable of grouping breast tumors into distinct categories 
(Martin et al., supra). 

The utilization of gene expression profiles to classify tumors, to identify drug 
targets, to identify diagnostic markers and/or to gain further insights into the consequences 
of chemotherapeutic treatments could facilitate the design of more efficacious patient- 
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specific stratagems for treating a variety of cancers. In breast cancer, studies utilizing 
limited numbers of genes have classified tumors into subtypes based on gene expression 
profiles, and this study indicated a diversity of molecular phenotypes associated with breast 
tumors (Perou, C. M. et dL % Nature 406, 747-52 (2000). 

5 Although these studies have demonstrated that expression profiling may be used to 

produce improvements in diagnosis of breast cancer as well as the development of improved 
therapeutic strategies, further studies are needed as only a small portion of the genome was 
studied and analyses containing greater numbers of genes will advance our understanding of 
breast tumors even further. Accordingly, there remains a need in the art for materials and 

10 methods that permit a more accurate diagnosis of breast cancer and, in particular, ductal 
carcinoma. In addition, there remains a need in the art for methods to treat and methods to 
identify agents that can effectively treat breast cancer. The present invention meets these 
and other needs. 

1 5 SUMMARY OF THE INVENTION 

The present invention is based on the discovery of the genes and their expression 
profiles associated with various types and stages of breast cancer. 

The invention includes methods of diagnosing breast cancer in a patient comprising 
the step of detecting the level of expression in a tissue sample of two or more genes from 
20 Tables 1-5; wherein differential expression of the genes in Tables 1 -5 is indicative of breast 
cancer. 

The invention also includes methods of detecting the progression of breast cancer. 
For instance, methods of the invention include detecting the progression of breast cancer in 
a patient comprising the step of detecting the level of expression in a tissue sample of two or 

25 more genes from Tables 1-5; wherein differential expression of the genes in Tables 1-5 is 
indicative of breast cancer progression. In some preferred embodiments, PC A (Principal 
Component Analysis) based on all or a portion of the group of 50 genes identified in Table 
1 may be used to differentiate between the different stages of breast cancer such as normal 
versus DCIS (ductal carcinoma in-situ) or DCIS versus microinvasive tissue samples. In 

30 some preferred embodiments, one or more genes may be selected from Tables 1, 3, 4 and/or 
5. 

In some aspects, the present invention provides a method of monitoring the 
treatment of a patient with breast cancer, comprising administering a pharmaceutical 
composition to the patient and preparing a gene expression profile from a cell or tissue 
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sample from the patient and comparing the patient gene expression profile to a gene 
expression from a cell population comprising normal breast cells or to a gene expression 
profile from a cell population comprising breast cancer cells or to both. In some preferred 
embodiments, the gene profile will include the expression level of one or more genes in 
5 Tables 1-5. 

Another aspect of the present invention includes a method of treating a patient with 
breast cancer, comprising administering to the patient a pharmaceutical composition, 
wherein the composition alters the expression of at least one gene in Tables 1-5, preparing a 
gene expression profile from a cell or tissue sample from the patient comprising tumor cells 

10 and comparing the patient expression profile to a gene expression profile from an untreated 
cell population comprising breast cancer cells. 

In another aspect, the present invention provides a method of identifying ductal 
carcinoma in a patient, comprising detecting the level of expression in a tissue sample of 
two or more genes from Tables 1-5, wherein differential expression of the genes in Tables 

15 1-5 is indicative of ductal carcinoma. In addition, by determining the expression level of 
two or more genes in the group of genes listed in Tables 1-5, one skilled in the art can 
differentiate between DCIS and a cribiform type of DCIS that is more prone to 
microinvasion. 

In another aspect, the present invention provides a method of detecting the 
20 progression of carcinogenesis in a patient, comprising detecting the level of expression in a 
tissue sample of two or more genes from Tables 1-5; wherein differential expression of the 
genes in Tables 1-5 is indicative of breast carcinogenesis. Figures 6 and 7 are a graphical 
representation of how the genes listed in Table 5 cluster with disease stages in breast cancer. 
The invention further includes methods of screening for an agent capable of 
25 modulating the onset or progression of breast cancer, comprising the steps of exposing a cell 
to the agent; and detecting the expression level of two or more genes from Tables 1 -5. In 
some embodiments, the breast cancer may be a ductal carcinoma. In some preferred 
embodiments, one or more genes may be selected from a group consisting of those listed in 
Tables 1, 3, 4 and/or 5. In some preferred methods, it may be desirable to detect all or 
30 nearly all of the genes in the tables. 

The invention further includes compositions comprising at least two 
oligonucleotides, wherein each of the oligonucleotides comprises a sequence that 
specifically hybridizes to a gene in Tables 1-5 as well as solid supports comprising at least 
two probes, wherein each of the probes comprises a sequence that specifically hybridizes to 
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a gene in Tables 1-5. In some preferred embodiments, one or more genes may be selected 
from a group consisting of those listed in Tables 1, 3, 4 and/or 5. 

The invention further includes computer systems comprising a database containing 
information identifying the expression level in breast tissue of a set of genes comprising at 

5 least two genes in Tables 1-5 and a user interface to view the information. In some 

preferred embodiments, one or more genes may be selected from a group consisting of those 
listed in Tables 1, 3, 4 and/or 5. The database may further include sequence information for 
the genes, information identifying the expression level for the set of genes in normal breast 
tissue and cancerous tissue and may contain links to external databases such as GenBank. 

10 Lastly, the invention includes methods of using the databases, such as methods of 

using the disclosed computer systems to present information identifying the expression level 
in a tissue or cell of at least one gene in Tables 1-5, comprising the step of comparing the 
expression level of at least one gene in Tables 1-5 in the tissue or cell to the level of 
expression of the gene in the database. In some preferred embodiments, two or more genes 

15 may be selected from a group consisting of those listed in Tables 1, 3, 4 and/or 5. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is an E- northern showing the expression of topoisomerase II alpha in 
various tissue types. 

20 Figure 2 is an E-northern showing the expression of ICBP90 in various tissue types. 

Figure 3 is an E-northern showing the expression of MCT4 gene. 
Figure 4 is an E-northern showing the expression of the frizzled related protein. 
Figure 5 is an E-northern showing the expression of an EST Affy ID AI668620. 
Figure 6 is a PCA of the set of 28 samples using the top 50 genes identified by p- 

25 values. 

Figure 7 is a PCA of the set of 33 samples using the top 50 genes and ESTs 
identified by p-values. 

Figure 8 is a PCA of the set of 91 samples using the top 31 myo-lamina genes and 

ESTs. 

30 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Many biological functions are accomplished by altering the expression of various 
genes through transcriptional (e.g., through control of initiation, provision of RNA 
precursors, RNA processing, etc.) and/or translational control. For example, fundamental 
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biological processes such as cell cycle, cell differentiation and cell death, are often 
characterized by the variations in the expression levels of groups of genes. 

Changes in gene expression also are associated with pathogenesis. For example, the 
lack of sufficient expression of functional tumor suppressor genes and/or the over 

5 expression of oncogene/protooncogenes could lead to tumorgenesis or hyperplastic growth 
of cells (Marshall, Cell 64, 313-326 (1991); Weinberg, Science, 254, 1 138-1 146 (1991)). 
Thus, changes in the expression levels of particular genes {e.g., oncogenes or tumor 
suppressors) serve as signposts for the presence and progression of various diseases. 

Monitoring changes in gene expression may also provide certain advantages during 

10 drug screening and development. Often drugs are pre-screened for the ability to interact 
with a major target without regard to other effects the drugs have on cells. Often such other 
effects cause toxicity in the whole animal, which prevent the development and use of the 
potential drug. 

Applicants have examined samples from normal breast tissue and from cancerous 
15 breast tissue to identify global changes in gene expression between tumor biopsies and 
normal tissue. These global changes in gene expression, also referred to as expression 
profiles, provide useful markers for diagnostic uses as well as markers that can be used to 
monitor disease states, disease progression, drug toxicity, drug efficacy and drug 
metabolism. 

20 The gene expression profiles described herein were derived from normal and tumor 

samples from female patients between the ages of 39 and 52 years old, and were from three 
different ethnic origins (Caucasian, African- American and Asian). Infiltrating Ductal 
Carcinoma (IDC) patient samples were studied for cancer-related expression, as 85% of the 
breast cancer patients were afflicted with this form of the disease. 

25 Histological analysis of each tissue sample was performed and samples were 

segregated into either normal or malignant categories. The normal tissue samples were 
acquired from neighboring tissue of patients suffering from one of the following disorders: 
macromastia, mild fibrosis, infiltrating lobular carcinoma, or infiltrating ducal carcinoma, 
however; each tissue was diagnosed as normal by histological analysis. Samples were also 

30 characterized by the type and grade of IDC for each patient sample utilized in the study. 

The present invention provides compositions and methods to detect the level of 
expression of genes that may be differentially expressed dependent upon the state of the 
cell, i.e., normal versus cancerous. These expression profiles of genes provide molecular 
tools for evaluating toxicity, drug efficacy, drug metabolism, development, and disease 
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monitoring. Changes in the expression profile from a baseline profile can be used as an 
indication of such effects. Those skilled in the art can use any of a variety of known 
techniques to evaluate the expression of one or more of the genes and/or gene fragments 
identified in the instant application in order to observe changes in the expression profile in a 
5 tissue or sample of interest. 



Definitions 

In the description that follows, numerous terms and phrases known to those skilled 
in the art are used. In the interest of clarity and consistency of interpretation, the definitions 

10 of certain terms and phrases are provided. 

As used herein, the phrase "detecting the level of expression" includes methods that 
quantify expression levels as well as methods that determine whether a gene of interest is 
expressed at all. Thus, an assay which provides a yes or no result without necessarily 
providing quantification of an amount of expression is an assay that requires "detecting the 

1 5 level of expression" as that phrase is used herein. 

As used herein, oligonucleotide sequences that are complementary to one or more of 
the genes described herein, refers to oligonucleotides that are capable of hybridizing under 
stringent conditions to at least part of the nucleotide sequence of said genes. Such 
hybridizable oligonucleotides will typically exhibit at least about 75% sequence identity at 

20 the nucleotide level to said genes, preferably about 80% or 85% sequence identity or more 
preferably about 90% or 95% or more nucleotide sequence identity to said genes. 

"Bind(s) substantially" refers to complementary hybridization between a probe 
nucleic acid and a target nucleic acid and embraces minor mismatches that can be 
accommodated by reducing the stringency of the hybridization media to achieve the desired 

25 detection of the target polynucleotide sequence. 

The terms background" or "background signal intensity" refer to hybridization 
signals resulting from non-specific binding, or other interactions, between the labeled target 
nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, 
control probes, the array substrate, etc.). Background signals may also be produced by 

30 intrinsic fluorescence of the array components themselves. A single background signal can 
be calculated for the entire array, or a different background signal may be calculated for 
each target nucleic acid. In a preferred embodiment, background is calculated as the 
average hybridization signal intensity for the lowest 5% to 10% of the probes in the array, 
or, where a different background signal is calculated for each target gene, for the lowest 5% 
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to 10% of the probes for each gene. Of course, one of skill in the art will appreciate that 

where the probes to a particular gene hybridize well and thus appear to be specifically 

binding to a target sequence, they should not be used in a background signal calculation. 

Alternatively, background may be calculated as the average hybridization signal intensity 
5 produced by hybridization to probes that are not complementary to any sequence found in 

the sample (e.g., probes directed to nucleic acids of the opposite sense or to genes not found 

in the sample such as bacterial genes where the sample is mammalian nucleic acids). 

Background can also be calculated as the average signal intensity produced by regions of 

the array that lack any probes at all. 
10 The phrase "hybridizing specifically to" refers to the binding, duplexing or 

hybridizing of a molecule substantially to or only to a particular nucleotide sequence or 

sequences under stringent conditions when that sequence is present in a complex mixture 

(e.g., total cellular) DNA or RNA. 

Assays and methods of the invention may utilize available formats to simultaneously 
15 screen at least about 100, preferably about 1000, more preferably about 10,000 and most 

preferably about 1,000,000 or more different nucleic acid hybridizations. 

The terms "mismatch control" or "mismatch probe" refer to a probe whose sequence 

is deliberately selected not to be perfectly complementary to a particular target sequence. 

For each mismatch (MM) control in a high-density array there typically exists a 
20 corresponding perfect match (PM) probe that is perfectly complementary to the same 

particular target sequence. The mismatch may comprise one or more bases that are not 

complementary to the corresponding bases of the target sequence. 

While the mismatch(s) may be located anywhere in the mismatch probe, terminal 

mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization 
25 of the target sequence. In a particularly preferred embodiment, the mismatch is located at or 

near the center of the probe such that the mismatch is most likely to destabilize the duplex 

with the target sequence under the test hybridization conditions. 

The term "perfect match probe" refers to a probe that has a sequence that is perfectly 

complementary to a particular target sequence. The test probe is typically perfectly 
30 complementary to a portion (subsequence) of the target sequence. The perfect match (PM) 

probe can be a "test probe", a "normalization control" probe, an expression level control . 

probe and the like. A perfect match control or perfect match probe is, however, 

distinguished from a "mismatch control" or "mismatch probe." 
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As used herein a "probe" is defined as a nucleic acid, preferably an oligonucleotide, 
capable of binding to a target nucleic acid of complementary sequence through one or more 
types of chemical bonds, usually through complementary base pairing, usually through 
hydrogen bond formation. As used herein, a probe may include natural A, G, U, C or 
5 T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in probes may 
be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with 
hybridization. Thus, probes may be peptide nucleic acids in which the constituent bases are 
joined by peptide bonds rather than phosphodiester linkages. 

The term "stringent conditions" refers to conditions under which a probe will 

10 hybridize to its target subsequence, but with only insubstantial hybridization to other 
sequences or to other sequences such that the difference may 'be identified. Stringent 
conditions are sequence-dependent and will be different in different circumstances. Longer 
sequences hybridize specifically at higher temperatures. Generally, stringent conditions are 
selected to be about 5°C lower than the thermal melting point (Tm) for the specific 

15 sequence at a defined ionic strength and pH. 

Typically, stringent conditions will be those in which the salt concentration is at 
least about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the 
temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotide). Stringent 
conditions may also be achieved with the addition of destabilizing agents such as 

20 formamide. 

The "percentage of sequence identity" or "sequence identity" is determined by 
comparing two optimally aligned sequences or subsequences over a comparison window or 
span, wherein the portion of the polynucleotide sequence in the comparison window may 
optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence 

25 (which does not comprise additions or deletions) for optimal alignment of the two 

sequences. The percentage is calculated by determining the number of positions at which 
the identical subunit (e.g., nucleic acid base or amino acid residue) occurs in both sequences 
to yield the number of matched positions, dividing the number of matched positions by the 
total number of positions in the window of comparison and multiplying the result by 100 to 

30 yield the percentage of sequence identity. Percentage sequence identity when calculated 
using the programs GAP or BESTFIT (see below) is calculated using default gap weights. 

Homology or identity may be determined by BLAST (Basic Local Alignment 
Search Tool) analysis using the algorithm employed by the programs blastp, blastn, blastx, 
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tblastn and tblastx (Karlin et al, Proc Natl Acad Sci USA 87, 2264-2268 (1990) and 
Altschul, JMol Evol 36, 290-300 (1993), fully incorporated by reference) which are tailored 
for sequence similarity searching. The approach used by the BLAST program is to first 
consider similar segments between a query sequence and a database sequence, then to 
5 evaluate the statistical significance of all matches that are identified and finally to 

summarize only those matches which satisfy a preselected threshold of significance. For a 
discussion of basic issues in similarity searching of sequence databases, see Altschul et ai> 
(Nature Genet 6, 1 19-129 (1994)) which is fully incorporated by reference. The search 
parameters for histogram, descriptions, alignments, expect (i.e., the statistical significance 

10 threshold for reporting matches against database sequences), cutoff, matrix and filter are at 
the default settings. The default scoring matrix used by blastp, blastx, tblastn, and tblastx is 
the BLOSUM62 matrix (Henikoffe/ a/., Proc Natl Acad Sci USA 89, 10915-10919, (1992) 
fully incorporated by reference). Four blastn parameters were adjusted as follows: Q=10 
(gap creation penalty); R=10 (gap extension penalty); wink=l (generates word hits at every 

15 wink* 11 position along the query); and gapw=l 6 (sets the window width within which gapped 
alignments are generated). The equivalent Blastp parameter settings were Q=9; R=2; 
wink=l ; and gapw=32. A Bestfit comparison between sequences, available in the GCG 
package version 10.0, uses DNA parameters GAP=50 (gap creation penalty) and LEN=3 
(gap extension penalty) and the equivalent settings in protein comparisons are GAP=8 and 

20 LEN=2. 

Uses of Differentially Expressed Genes 

The present invention identifies those genes differentially expressed between normal 
breast tissue and cancerous breast tissue. One of skill in the art can select one or more of 

25 the genes identified as being differentially expressed in Tables 1-5 and use the information 
and methods provided herein to interrogate or test a particular sample. For a particular 
interrogation of two conditions or sources, it may be desirable to select those genes which 
display a great deal of difference in the expression pattern between the two conditions or 
sources. At least a two-fold difference may be desirable, but a three-fold, five-fold or 

30 ten-fold difference may be preferred in some instances. Interrogations of the genes or 
proteins can be performed to yield different information. 

Diagnostic Uses for the Breast Cancer Markers 
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As described herein, the genes and gene expression information provided in Tables 
1-5 may be used as diagnostic markers for the prediction or identification of the malignant 
state of breast tissue. For instance, a breast tissue sample or other sample from a patient 
may be assayed by any of the methods known to those skilled in the art, and the expression 
5 levels from one or more genes from Tables 1-5, may be compared to the expression levels 
found in normal breast tissue, tissue from breast carcinoma or both. Expression profiles 
generated from the tissue or other samples that substantially resemble an expression profile 
from normal or diseased breast tissue may be used, for instance, to aid in disease diagnosis. 
Comparison of the expression data, as well as available sequence or other information may 

10 be done by researcher or diagnostician or may be done with the aid of a computer and 
databases as described herein. 

For example, genes over-expressed by 3-fold or greater, as well as having the 
smallest p-values from a t-test, were discovered by comparing 13 normal tissue samples and 
15 infiltrating ductal carcinoma tissue samples composed of mostly stage II and III tissue 

15 samples. This analysis provided a set of genes (listed in Table 1) capable of distinguishing 
between the 13 normal and 15 tumor samples by PCA (Principal Component Analysis). In 
order to evaluate the ability of the genes to distinguish between normal and tumor tissue 
samples, a group of 33 tissues was selected from an existing gene expression database 
composed of normal, benign, DCIS (ductal carcinoma in-situ), microinvasive, stage I, stage 

20 II, and stage m breast cancer samples. PCA of the 33 tissue samples indicated that the 
genes selected based on the smallest p-values classified 32 out of 33 tissue samples 
correctly, while one stage I tissue sample was misclassified as a normal sample. 
Accordingly, these genes can be used diagnostically to differentiate normal/benign samples 
from tissue samples containing intraductal or infiltrating ductal carcinoma of the breast. 

25 In another study, the PCA based on this group of genes indicates that these genes 

maybe used to differentiate between the different stages of breast cancer such as normal 
versus DCIS or DCIS versus microinvasive tissue samples as graphically shown in Figures 
6 and 7. The DCIS sample that contained focal microinvasions was grouped with the Stage 
I and II tumor samples. This group of genes may be used to determine if a DCIS sample 

30 contains microinvasions. 



Use of the Breast Cancer Markers for Monitoring Disease Progression 

Molecular expression markers for breast cancer can be used to confirm the type and 
progression of cancer made on the basis of morphological criteria. For example, normal 
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breast tissue could be distinguished from invasive carcinoma based on the level and type of 
genes expressed in a tissue sample. In some situations, identifications of cell type or source 
is ambiguous based on classical criteria. In these situations, the molecular expression 
markers of the present invention are useful. 

5 In addition, progression of ductal carcinoma in situ to microinvasive carcinoma can 

be monitored by following the expression patterns of the involved genes using the 
molecular expression markers of the present invention. Monitoring of the efficacy of 
certain drug regimens can also be accomplished by following the expression patterns of the 
molecular expression markers. 

10 In addition to the different disease progression stages which have been shown in 

Figures 6-7, as shown in the examples below, other developmental stages can be identified 
using these same molecular expression markers. While the importance of these markers in 
development has been shown here, variations in their expression may occur at other times. 
For example, variation in the expression level of one or more of the marker genes identified 

15 herein may be use to distinguish benign stages of breast cancer from malignant states. 

As described above, the genes and gene expression information provided in Tables 
1-5 may also be used as markers for the direct monitoring of disease progression, for 
instance, the development of breast cancer. For instance, a breast tissue sample or other 
sample from a patient may be assayed by any of the methods known to those of skill in the 

20 art, and the expression levels in the sample from a gene or genes from Tables 1-5 may be 
compared to the expression levels found in normal breast tissue, tissue from breast cancer or 
both. Comparison of the expression data, as well as available sequence or other information 
may be done by researcher or diagnostician or may be done with the aid of a computer and 
databases as described herein. 

25 For instance, methods of this invention may use the 35 gene group (profile) 

composed of genes expressed in myoepithelial cells and basal lamina components in Table 
3. The absence of both myoepithelial cells or basement membrane components usually 
indicates that the intraductal carcinoma is invasive. This group of 35 genes listed in Table 3 
may be used to determine if myoepithelial and/or basal lamina components are present in a 

30 tissue sample. It includes 23 genes exhibiting a fold change of 3 fold or higher and 12 
genes displaying a change of less than 3 fold. This group of 23 genes was used to 
distinguish between normal and tumor samples for a group of 33 tissue samples. In this 
study, the 23 genes were able to classify 32 out of 33 samples correctly and 26 out of 28 
samples used to isolate this subgroup of genes. This group of genes can be used to identify 
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the various stages of ductal carcinoma tissues more discretely than the 50-gene set. The 
study also demonstrates that this group of genes can differentiate between DCIS and a 
cribiform type of DCIS that is more prone to microinvasion. Clinically, the ability to 
discern DCIS with microinvasions or phenotypes prone to microinvasions such as the 
5 cribiform type would allow subgrouping of the samples containing microinvasions as a type 
of patient that should be treated more aggressively than DCIS patients lacking this gene 
expression fingerprint. A subclass of DCIS (cribiform type) based on the gene expression 
fingerprint may be subgrouped as a micro invasive sample based on the gene expression 
pattern associated with this sample. 

10 

Use of the Breast Cancer Markers for Drug Screening 

According to the present invention, potential drugs can be screened to determine if 
application of the drug alters the expression of one or more of the genes identified herein. 
This may be useful, for example, in determining whether a particular drug is effective in 

15 treating a particular patient with breast cancer. In the case where a gene's expression is 
affected by the potential drug such that its level of expression returns to nonnal, the drug is 
indicated in the treatment of breast cancer. Similarly, a drug which causes expression of a 
gene which is not normally expressed by epithelial cells in the breast, may be contra- 
indicated in the treatment of breast cancer. 

20 According to the present invention, the genes identified in Tables 1-5 may also 

be used as markers to evaluate the effects of a candidate drug or agent on a cell, particularly 
a cell undergoing malignant transformation, for instance, a breast cancer cell or tissue 
sample. A candidate drug or agent can be screened for the ability to stimulate the 
transcription or expression of a given marker or markers (drug targets) or to down-regulate 

25 or inhibit the transcription or expression of a marker or markers. According to the present 
invention, one can also compare the specificity of a drug's effects by looking at the number 
of markers affected by the drug and comparing them to the number of markers affected by a 
different drug. A more specific drug will affect fewer transcriptional targets. Similar sets 
of markers identified for two drugs indicates a similarity of effects. 

30 Assays to monitor the expression of a marker or markers as defined in Tables 1-5 

may utilize any available means of monitoring for changes in the expression level of the 
nucleic acids of the invention. As used herein, an agent is said to modulate the expression 
of a nucleic acid of the invention if it is capable of up- or down-regulating expression of the 
nucleic acid in a cell. 
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Agents that are assayed in the above methods can be randomly selected or rationally 
selected or designed. As used herein, an agent is said to be randomly selected when the 
agent is chosen randomly without considering the specific sequences involved in the 
association of the a protein of the invention alone or with its associated substrates, binding 

5 partners, etc. An example of randomly selected agents is the use a chemical library or a 
peptide combinatorial library, or a growth broth of an organism. 

As used herein, an agent is said to be rationally selected or designed when the agent 
is chosen on a nonrandom basis which takes into account the sequence of the target site 
and/or its conformation in connection with the agents action. Agents can be selected or 

10 designed by utilizing the peptide sequences that make up these sites. For example, a 

rationally selected peptide agent can be a peptide whose amino acid sequence is identical to 
or a derivative of any functional consensus site. 

The agents of the present invention can be, as examples, peptides, small chemical 
molecules, vitamin derivatives, as well as carbohydrates, lipids, oligonucleotides and 

15 covalent and non-covalent combinations thereof. Dominant negative proteins, DNA 

encoding these proteins, antibodies to these proteins, peptide fragments of these proteins or 
mimics of these proteins may be introduced into cells to affect function. "Mimic" as used 
herein refers to the modification of a region or several regions of a peptide molecule to 
provide a structure chemically different from the parent peptide but topographically and 

20 functionally similar to the parent peptide (see Grant in Molecular Biology and 

Biotechnology . Meyers, ed., VCH Publishers (1995)). A skilled artisan can readily 
recognize that there is no limit as to the structural nature of the agents of the present 
invention. 

25 Assay Formats 

The genes identified as being differentially expressed in breast cancer may be used 
in a variety of nucleic acid detection assays to detect or quantify the expression level of a 
gene or multiple genes in a given sample. For example, traditional Northern blotting, 
nuclease protection, RT-PCR and differential display methods may be used for detecting 

30 gene expression levels. 

The protein products of the genes identified herein can also be assayed to determine 
the amount of expression. Methods for assaying for a protein include Western blot, 
immunoprecipitation, radioimmunoassay. It is preferred, however, that the mRNA be 
assayed as an indication of expression. Methods for assaying for mRNA include Northern 
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blots, slot blots, dot blots, and hybridization to an ordered array of oligonucleotides. Any 
method for specifically and quantitatively measuring a specific protein or mRNA or DNA 
product can be used. However, methods and assays of the invention are most efficiently 
designed with PCR or array or chip hybridization-based methods for detecting the 

5 expression of a large number of genes. 

Any hybridization assay format may be used, including solution-based and solid 
support-based assay formats. A preferred solid support is a high density array also known 
as a DNA chip or a gene chip. One variation of the DNA chip contains hundreds of 
thousands of discrete microscopic channels that pass completely through it. Probe 

10 molecules are attached to the inner surface of these channels, and molecules from the 

samples to be tested flow through the channels, coming into close proximity with the probes 
for hybridization. In one assay format, gene chips containing probes to at least two genes 
from Tables 1-5 may be used to directly monitor or detect changes in gene expression in the 
treated or exposed cell as described herein. Assays of the invention may measure the 

15 expression levels of about one, two, three, five, seven, ten, 1 5, 20, 25, 50, 100 or more 
genes in the Tables. 

The genes and ESTs of the present invention may be assayed in any convenient 
sample form. For example, samples may be assayed in the form mRNA or reverse 
transcribed mRNA. Samples may be cloned or not and the samples or individual genes may 

20 be amplified or not. The cloning itself does not appear to bias the representation of genes 
within a population. However, it may be preferable to use polyA+ RNA as a source, as it 
can be used with less processing steps. In some embodiments, it may be preferable to assay 
the protein or peptide expressed by the gene. 

The sequences of the expression marker genes of Tables 1-5 are available in the 

25 public databases. Tables 1-5 provide the Accession numbers and name for each of the 
sequences. The sequences of the genes in GenBank are herein expressly incorporated by 
reference in their entirety as of the filing date of this application, (see 
www.ncbi.nim.mh.goVl , 

Additional assay formats may be used to monitor the ability of the agent to modulate 

30 the expression of a gene identified in Tables 1-5. For instance, as described above, mRNA 
expression may be monitored directly by hybridization of probes to the nucleic acids of the 
invention. Cell lines are exposed to an agent to be tested under appropriate conditions and 
time and total RNA or mRNA is isolated by standard procedures such those disclosed in 
Sambrook et al 9 Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
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Laboratory Press, Cold Spring Harbor, NY (1989)). In some embodiments, it may be 
desirable to amplify one or more of the RNA molecules isolated prior to application of the 
RNA to the gene chip. Using techniques well known in the art, the RNA may be reverse 
transcribed and amplified in the form of DNA or may be reverse transcribed into DNA and 

5 the DNA used as a template for transcription to generate recombinant RNA. Any method 
that results in the production of a sufficient quantity of nucleic acid to be hybridized 
effectively to the gene chip may be used. 

In another format, cell lines that contain reporter gene fusions between the open 
reading frame and or the 3' or 5 ' regulatory regions of a gene in Tables 1-5 and any 

1 0 assayable fusion partner may be prepared. Numerous assayable fusion partners are known 
and readily available including the firefly luciferase gene and the gene encoding 
chloramphenicol acetyltransferase (Alam et ah, Anal Biochem 188, 245-254 (1990)). Cell 
lines containing the reporter gene fusions are then exposed to the agent to be tested under 
appropriate conditions and time. Differential expression of the reporter gene between 

15 samples exposed to the agent and control samples identifies agents which modulate the 
expression of the nucleic acid. 

In another assay format, cells or cell lines are first identified which express one or 
more of the gene products of the invention physiologically. Cells and/or cell lines so 
identified would preferably comprise the necessary cellular machinery to ensure that the 

20 transcriptional and/or translational apparatus of the cells would faithfully mimic the 
' response of normal or cancerous breast tissue to an exogenous agent. Such machinery 
would likely include appropriate surface transduction mechanisms and/or cytosolic factors. 
Such cell lines may be, but are not required to be, derived from breast tissue. The cells 
and/or cell lines may then be contacted with an agent and the expression of one or more of 

25 the genes of interest may then be assayed. The genes may be assayed at the mRNA level 
and/or at the protein level. 

In some embodiments, such cells or cell lines may be transduced or transfected with 
an expression vehicle (e.g., a plasmid or viral vector) containing an expression construct 
comprising an operable 5'-promoter containing end of a gene of interest identified in Tables 

30 1-5 fused to one or more nucleic acid sequences encoding one or more antigenic fragments. 
The construct may comprise all or a portion of the coding sequence of the gene of interest 
which may be positioned 5'- or 3'- to a sequence encoding an antigenic fragment. The 
coding sequence of the gene of interest may be translated or un-translated after transcription 
of the gene fusion. At least one antigenic fragment may be translated. The antigenic 
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fragments are selected so that the fragments are under the transcriptional control of the 
promoter of the gene of interest and are expressed in a fashion substantially similar to the 
expression pattern of the gene of interest. The antigenic fragments may be expressed as 
polypeptides whose molecular weight can be distinguished from the naturally occurring 
5 polypeptides. In some embodiments, gene products of the invention may further comprise 
an immunologically distinct tag. Such a process is well known in the art (see Sambrook et 
al, supra). 

Cells or cell lines transduced or transfected as outlined above are then contacted 
with agents under appropriate conditions; for example, the agent comprises a 

10 pharmaceutical^ acceptable excipient and is contacted with cells comprised in an aqueous 
physiological buffer such as phosphate buffered saline (PBS) at physiological pH, Eagles 
balanced salt solution (BSS) at physiological pH, PBS or BSS comprising serum or 
conditioned media comprising PBS or BSS and serum incubated at 37°C. Said conditions 
may be modulated as deemed necessary by one of skill in the art. Subsequent to contacting 

15 the cells with the agent, said cells will be disrupted and the polypeptides of the lysate are 
fractionated such that a polypeptide fraction is pooled and contacted with an antibody to be 
further processed by immunological assay (e.g., ELISA, immunoprecipitation or Western 
blot). The pool of proteins isolated from the "agent-contacted" sample will be compared 
with a control sample where only the excipient is contacted with the cells and an increase or 

20 decrease in the immunologically generated signal from the "agent-contacted" sample 
compared to the control will be used to distinguish the effectiveness of the agent. 

Another embodiment of the present invention provides methods for identifying 
agents that modulate the levels, concentration or at least one activity of a protein(s) encoded 
by the genes in Tables 1 -5. Such methods or assays may utilize any means of monitoring or 

25 detecting the desired activity. 

In one format, the relative amounts of a protein of the invention produced in a cell 
population that has been exposed to the agent to be tested may be compared to the amount 
produced in an un-exposed control cell population. In this format, probes such as specific 
antibodies are used to monitor the differential expression of the protein in the different cell 

30 populations. Cell lines or populations are exposed to the agent to be tested under 

appropriate conditions and time. Cellular lysates may be prepared from the exposed cell 
line or population and a control, unexposed cell line or population. The cellular lysates are 
then analyzed with the probe, such as a specific antibody. 
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Probe Design 

Probes based on the sequences of the genes described herein may be prepared by 
any commonly available method. Oligonucleotide probes for assaying the tissue or cell 

5 sample are preferably of sufficient length to specifically hybridize only to appropriate, 
complementary genes or transcripts. Typically the oligonucleotide probes will be at least 
10, 12, 14, 16, 18, 20 or 25 nucleotides in length. In some cases longer probes of at least 
30, 40, or 50 nucleotides will be desirable. 

One of skill in the art will appreciate that an enormous number of array designs are 

10 suitable for the practice of this invention. The high density array will typically include a 
number of probes that specifically hybridize to the sequences of interest. See WO 99/32660 
for methods of producing probes for a given gene or genes. In addition, in a preferred 
embodiment, the array will include one or more control probes. 

High density array chips of the invention include "test probes." Test probes may be 

15 oligonucleotides that range from about 5 to about 500 or about 5 to about 50 nucleotides, 
more preferably from about 1 0 to about 40 nucleotides and most preferably from about 15 
to about 40 nucleotides in length. In other particularly preferred embodiments, the probes 
are about 20 or 25 nucleotides in length. In another preferred embodiment, test probes are 
double or single strand DNA sequences. DNA sequences may be isolated or cloned from 

20 natural sources or amplified from natural sources using natural nucleic acid as templates. 
These probes have sequences complementary to particular subsequences of the genes whose 
expression they are designed to detect. Thus, the test probes are capable of specifically 
hybridizing to the target nucleic acid they are to detect. 

In addition to test probes that bind the target nucleic acid(s) of interest, the high 

25 density array can contain a number of control probes. The control probes fall into three 

categories referred to herein as (1) normalization controls; (2) expression level controls; and 
(3) mismatch controls. 

Normalization controls are oligonucleotide or other nucleic acid probes that are 
complementary to labeled reference oligonucleotides or other nucleic acid sequences that 

30 are added to the nucleic acid sample. The signals obtained from the normalization controls 
after hybridization provide a control for variations in hybridization conditions, label 
intensity, "reading" efficiency and other factors that may cause the signal of a perfect 
hybridization to vary between arrays. In a preferred embodiment, signals (e.g., fluorescence 
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intensity) read from all other probes in the array are divided by the signal (e.g., fluorescence 
intensity) from the control probes thereby normalizing the measurements. 

Virtually any probe may serve as a normalization control. However, it is recognized 
that hybridization efficiency varies with base composition and probe length. Preferred 
5 normalization probes are selected to reflect the average length of the other probes present in 
the array, however, they can be selected to cover a range of lengths. The normalization 
control(s) can also be selected to reflect the (average) base composition of the other probes 
in the array, however in a preferred embodiment, only one or a few probes are used and they 
are selected such that they hybridize well (i.e., no secondary structure) and do not match 

1 0 any target-specific probes. 

Expression level controls are probes that hybridize specifically with constitutively 
expressed genes in the biological sample. Virtually any constitutively expressed gene 
provides a suitable target for expression level controls. Typical expression level control 
probes have sequences complementary to subsequences of constitutively expressed 

15 housekeeping genes" including, but not limited to the P-actin gene, the transferrin receptor 
gene, the GAPDH gene, and the like. 

Mismatch controls may also be provided for the probes to the target genes, for 
expression level controls or for normalization controls. Mismatch controls are 
oligonucleotide probes or other nucleic acid probes identical to their corresponding test or 

20 control probes except for the presence of one or more mismatched bases. A mismatched 
base is a base selected so that it is not complementary to the corresponding base in the target 
sequence to which the probe would otherwise specifically hybridize. One or more 
mismatches are selected such that under appropriate hybridization conditions (e.g., stringent 
conditions) the test or control probe would be expected to hybridize with its target sequence, 

25 but the mismatch probe would not hybridize (or would hybridize to a significantly lesser 
extent). Preferred mismatch probes contain a central mismatch. Thus, for example, where a 
probe is a twenty-mer, a corresponding mismatch probe may have the identical sequence 
except for a single base mismatch (e.g., substituting a G, a C or a T for an A) at any of 
positions 6 through 14 (the central mismatch). 

30 Mismatch probes thus provide a control for non-specific binding or cross 

hybridization to a nucleic acid in the sample other than the target to which the probe is 
directed. Mismatch probes also indicate whether a hybridization is specific or not. For 
example, if the target is present the perfect match probes should be consistently brighter 
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than the mismatch probes. In addition, if all central mismatches are present, the mismatch 
probes can be used to detect a mutation. The difference in intensity between the perfect 
match and the mismatch probe (I(PM) - I(MM)) provides a good measure of the 
concentration of the hybridized material. 

5 

Nucleic Acid Samples 

As is apparent to one of ordinary skill in the art, nucleic acid samples used in the 

methods and assays of the invention may be prepared by any available method or process. 

Methods of isolating total mRNA are also well known to those of skill in the art. For 
10 example, methods of isolation and purification of nucleic acids are described in detail in 

Chapter 3 of Laboratory Techniques in Biochemistry and Molecular Biology. Vol. 24. 

Hybridization With Nucleic Acid Probes: Theory and Nucleic Acid Probes, P. Tijssen, ed., 

Elsevier Press, New York (1993). Such samples include RNA samples, but also include 

cDNA synthesized from a mRNA sample isolated from a cell or tissue of interest. Such 
15 samples also include DNA amplified from the cDNA, and an RNA transcribed from the 

amplified DNA. One of skill in the art would appreciate that it may be desirable to inhibit 

or destroy RNase present in homogenates before homogenates can be used. 

Biological samples may be of any biological tissue or fluid or cells from any 

organism as well as cells raised in vitro, such as cell lines and tissue culture cells. 
20 Frequently the sample will be a "clinical sample" which is a sample derived from a patient. 

Typical clinical samples include, but are not limited to, breast tissue biopsy, sputum, blood, 

blood-cells (eg., white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, 

and pleural fluid, or cells therefrom. 

Biological samples may also include sections of tissues, such as frozen sections or 
25 formalin fixed sections taken for histological purposes. 

Solid Supports 

Solid supports containing oligonucleotide probes for differentially expressed genes 
can be any solid or semisolid support material known to those skilled in the art. Suitable 
30 examples include, but are not limited to, membranes, filters, tissue culture dishes, polyvinyl 
chloride dishes, beads, test strips, silicon or glass based chips and the like. Suitable glass 
wafers and hybridization methods are widely available, for example, those disclosed by 
Beattie (WO 95/1 1755). Any solid surface to which oligonucleotides can be bound, either 
directly or indirectly, either covalently or non-covalently, can be used. In some 
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embodiments, it may be desirable to attach some oligonucleotides covalently and others 
non-covalently to the same solid support. 

A preferred solid support is a high density array or DNA chip. These contain a 
particular oligonucleotide probe in a predetermined location on the array. Each 

5 predetermined location may contain more than one molecule of the probe, but each 

molecule within the predetermined location has an identical sequence. Such predetermined 
locations are termed features. There may be, for example, from 2, 10, 100, 1000 to 10,000, 
1 00,000 or 400,000 of such features on a single solid support. The solid support, or the area 
within which the probes are attached may be on the order of a square centimeter. 

10 Oligonucleotide probe arrays for expression monitoring can be made and used 

according to any techniques known in the art (see for example, Lockhart et al.Nat 
Biotechnol 14, 1675-1680 (1996); McGall et al y Proc Nat Acad Sci USA 93, 13555-13460 
(1996)). Such probe arrays may contain at least two or more oligonucleotides that are 
complementary to or hybridize to two or more of the genes described herein. Such arrays 

15 my also contain oligonucleotides that are complementary or hybridize to at least 3, 4, 5, 6, 
7, 8, 9, 10, 20, 30, 50, 70 or more the genes described herein. 

Methods of forming high density arrays of oligonucleotides with a minimal number 
of synthetic steps are known. The oligonucleotide analogue array can be synthesized on a 
solid substrate by a variety of methods, including, but not limited to, light-directed chemical 

20 coupling, and mechanically directed coupling (see Pirrung et al, (1992) U.S. Patent No. 
5,143, 854; Fodor et al, (1998) U.S. Patent No. 5,800,992; Chee et al, (1998) U.S. Patent 
No. 5,837,832). 

In brief, the light-directed combinatorial synthesis of oligonucleotide arrays on a 
glass surface proceeds using automated phosphoramidite chemistry and chip masking 

25 techniques. In one specific implementation, a glass surface is derivatized with a silane 
reagent containing a functional group, e.g., a hydroxyl or amine group blocked by a 
photolabile protecting group. Photolysis through a photolithogaphic mask is used 
selectively to expose functional groups which are then ready to react with incoming 5' 
photoprotected nucleoside phosphoramidites. The phosphoramidites react only with those 

30 sites which are illuminated (and thus exposed by removal of the photolabile blocking 

group). Thus, the phosphoramidites only add to those areas selectively exposed from the 
preceding step. These steps are repeated until the desired array of sequences have been 
synthesized on the solid surface. Combinatorial synthesis of different oligonucleotide 
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analogues at different locations on the array is determined by the pattern of illumination 
during synthesis and the order of addition of coupling reagents. 

In addition to the foregoing, additional methods which can be used to generate an 
array of oligonucleotides on a single substrate are described in Fodor et al, WO 93/09668. 
5 High density nucleic acid arrays can also be fabricated by depositing pre-made or natural 
nucleic acids in predetermined positions. Synthesized or natural nucleic acids are deposited 
on specific locations of a substrate by light directed targeting and oligonucleotide directed 
targeting. Another embodiment uses a dispenser that moves from region to region to 
deposit nucleic acids in specific spots. 

10 

Hybridization 

Nucleic acid hybridization simply involves contacting a probe and target nucleic 
acid under conditions where the probe and its complementary target can form stable hybrid 
duplexes through complementary base pairing (see Lockhart et al, WO 99/32660). The 

15 nucleic acids that do not form hybrid duplexes are then washed away leaving the hybridized 
nucleic acids to be detected, typically through detection of an attached detectable label. It is 
generally recognized that nucleic acids are denatured by increasing the temperature or 
decreasing the salt concentration of the buffer containing the nucleic acids. Under low 
stringency conditions (e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA- 

20 DNA, RNA-RNA or RNA-DNA) will form even where the annealed sequences are not 

perfectly complementary. Thus, specificity of hybridization is reduced at lower stringency. 
Conversely, at higher stringency (e.g., higher temperature or lower salt) successfiil 
hybridization requires fewer mismatches. One of skill in the art will appreciate that 
hybridization conditions may be selected to provide any degree of stringency. In a preferred 

25 embodiment, hybridization is performed at low stringency, in this case in 6x SSPE-T at 
37°C (0.005% Triton x-100) to ensure hybridization and then subsequent washes are 
performed at higher stringency (e.g., lx SSPE-T at 37°C) to eliminate mismatched hybrid 
duplexes. Successive washes may be performed at increasingly higher stringency (e.g., 
down to as low as 0.25x SSPET at 37°C to 50°C) until a desired level of hybridization 

30 specificity is obtained. Stringency can also be increased by addition of agents such as 

formamide. Hybridization specificity may be evaluated by comparison of hybridization to 
the test probes with hybridization to the various controls that can be present (e.g., 
expression level control, normalization control, mismatch controls, etc.). 
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In general, there is a tradeoff between hybridization specificity (stringency) and 
signal intensity. Thus, in a preferred embodiment, the wash is performed at the highest 
stringency that produces consistent results and that provides a signal intensity greater than 
approximately 10% of the background intensity. Thus, in a preferred embodiment, the 
5 hybridized array may be washed at successively higher stringency solutions and read 
between each wash. Analysis of the data sets thus produced will reveal a wash stringency 
above which the hybridization pattern is not appreciably altered and which provides 
adequate signal for the particular oligonucleotide probes of interest. 



10 Signal Detection 

The hybridized nucleic acids are typically detected by detecting one or more labels 
attached to the sample nucleic acids. The labels may be incorporated by any of a number of 
means well known to those of skill in the art (see Lockhart et al 9 WO 99/32660). 



15 Databases 

The present invention includes relational databases containing sequence information, 
for instance for one or more of the genes of Tables 1-5, as well as gene expression 
information in various breast tissue samples. Databases may also contain information 
associated with a given sequence or tissue sample such as descriptive information about the 

20 gene associated with the sequence information, descriptive information concerning the 
clinical status of the tissue sample, or information concerning the patient from which the 
sample was derived. The database may be designed to include different parts, for instance a 
sequence database and a gene expression database. Methods for the configuration and 
construction of such databases are widely available, for instance, see Akerblom et aL, 

25 (1999) U.S. Patent No. 5,953,727, which is specifically incorporated herein by reference in 
its entirety. 

The databases of the invention may be linked to an outside or external database. In a 
preferred embodiment, as described in Tables 1-5, the external database is GenBank and the 
associated databases maintained by the National Center for Biotechnology Information 
30 (NCBI). 

Any appropriate computer platform may be used to perform the necessary 
comparisons between sequence information, gene expression information and any other 
information in the database or provided as an input. For example, a large number of 
computer workstations are available from a variety of manufacturers, such has those 
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available from Silicon Graphics. Client-server environments, database servers and 
networks are also widely available and appropriate platforms for the databases of the 
invention. 

The databases of the invention may be used to produce, among other things, 
5 electronic Northern blots (E-Northerns) to allow the user to determine the cell type or tissue 
in which a given gene is expressed and to allow determination of the abundance or 
expression level of a given gene in a particular tissue or cell. The E-northern analysis can 
be used as a tool to discover tissue specific candidate therapeutic targets that are not over- 
expressed in tissues such as the liver, kidney, or heart. These tissue types often lead to 
10 detrimental side effects once drugs are developed and a first-pass screen to eliminate these 
targets early in the target discovery and validation process would be beneficial. 

The databases of the invention may also be used to present information identifying 
the expression level in a tissue or cell of a set of genes comprising at least one gene in 
Tables 1-5 comprising the step of comparing the expression level of at least one gene in 
15 Tables 1-5 in the tissue to the level of expression of the gene in the database. Such methods 
may be used to predict the physiological state of a given tissue by comparing the level of 
expression of a gene or genes in Tables 1-5 from a sample to the expression levels found in 
tissue from normal breast tissue, tissue from breast carcinoma or both. Such methods may 
also be used in the drug or agent screening assays as described herein. 

20 

Kits 

The invention further includes kits combining, in different combinations, high- 
density oligonucleotide arrays, reagents for use with the arrays, signal detection and array- 
processing instruments, gene expression databases and analysis and database management 

25 software described above. The kits may be used, for example, to monitor the progression of 
breast cancer, to identify genes that show promise as new drug targets and to screen known 
and newly designed drugs as discussed above. 

The databases packaged with the kits are a typically a compilation of expression 
patterns from human breast cancer tissue or cell lines and for gene and gene fragments as 

30 described herein (corresponding to the genes of Tables 1-5). In particular, the database 
software and packaged information include the expression results of Tables 1-5 that can be 
used to predict the cancerous state of a tissue sample by comparing the expression levels of 
the genes in the tissue or cell sample to the expression levels presented in Tables 1-5. 
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The kits may used in the pharmaceutical industry, where the need for early drug 
testing is strong due to the high costs associated with drug development, but where 
bioinformatics, in particular gene expression informatics, is still lacking. These kits will 
reduce the costs, time and risks associated with traditional new drug screening using cell 

5 cultures and laboratory animals. The results of large-scale drug screening of pre-grouped 
patient populations, pharmacogenomics testing, can also be applied to select drugs with 
greater efficacy and fewer side-effects. The kits may also be used by smaller biotechnology 
companies and research institutes who do not have the facilities for performing such large- 
scale testing themselves. 

10 Databases and software designed for use with use with microarrays is discussed in 

Balaban et al. y (2001) U.S. Patent Nos. 6,229,91 1, a computer-implemented method for 
managing information, stored as indexed tables, collected from small or large numbers of 
microarrays, and 6,185,561 , a computer-based method with data mining capability for 
collecting gene expression level data, adding additional attributes and reformatting the data 

15 to produce answers to various queries. Chee et aL, (1999) U.S. Patent No. 5,974,164, 

disclose a software-based method for identifying mutations in a nucleic acid sequence based 
on differences in probe fluorescence intensities between wild type and mutant sequences 
that hybridize to reference sequences. The object of the method is to predict regions or 
positions of mutation. 

20 

Without further description, it is believed that one of ordinary skill in the art can, 
using the preceding description and the following illustrative examples, make and utilize the 
compounds of the present invention and practice the claimed methods. The preceding 
working examples therefore, are illustrative only and should not be construed as limiting in 
25 any way the scope of the invention. 

Examples 

Example 1: Preparation of Breast Cancer Profiles 
Tissue Sample Acquisition and Preparation 
30 The patient tissue samples were derived from female patients; the average age for 

the normal and tumor samples was 39 and 52 years respectively. They stem from three 
different ethnic origins (Caucasian, African- American, and Asian). Furthermore, all tissue 
samples from Infiltrating Ductal Carcinoa (JDC) patient samples were studied for cancer- 
related expression, as 85% of the breast cancer patients were afflicted with this form of the 
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disease. The samples are composed of normal, benign, DCIS (ductal carcinoma in-situ), 
microinvasive, stage I, stage II, and stage IH breast cancer samples. 

Histological analysis of each of the tissue samples was performed and samples were 
segregated into either normal or malignant categories. The normal tissue samples were 

5 acquired from neighboring tissue of patients suffering from one of the following disorders: 
macromastia, mild fibrosis, infiltrating lobular carcinoma, or infiltrating ducal carcinoma, 
however; each tissue was diagnosed as normal by histological analysis. 

With minor modifications, the sample preparation protocol followed the Affymetrix 
GeneChip Expression Analysis Manual. Frozen tissue was first ground to powder using the 

10 Spex Certiprep 6800 Freezer Mill. Total RNA was then extracted using Trizol (Life 

Technologies). The total RNA yield for each sample (average tissue weight of 300 mg) was 
200-500 |ig. Next, mRNA was isolated using the Oligotex mRNA Midi kit (Qiagen). Since 
the mRNA was eluted in a final volume of 400 \x\, an ethanol precipitation step was 
required to bring the concentration to 1 jig/^L Using 1-5 jig of mRNA, double stranded 

15 cDNA was created using the Superscript Choice system (Gibco-BRL). First strand cDNA 
synthesis was primed with a T7-(dT24) oligonucleotide. The cDNA was then phenol- 
chloroform extracted and ethanol precipitated to a final concentration of 1 |ig/(il. 

From 2 \ig of cDNA, cRNA was synthesized according to standard procedures. To 
biotin label the cRNA, nucleotides Bio-1 1-CTP and Bio-16-UTP (Enzo Diagnostics) were 

20 added to the reaction. After a 37°C incubation for six hours, the labeled cRNA was cleaned 
up according to the Rneasy Mini kit protocol (Qiagen). The cRNA was then fragmented 
(5x fragmentation buffer: 200 mM Tris-Acetate (pH 8.1), 500 mM KOAc, 150 raM 
MgOAc) for thirty-five minutes at 94°C. 

55 ^ig of fragmented cRNA was hybridized on the human and the Human Genome 

25 U95 set of arrays for twenty- four hours at 60 rpm in a 45°C hybridization oven. The chips 
were washed and stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in ■ 
Affymetrix fluidics stations. To amplify staining, SAPE solution was added twice with an 
anti-streptavidin biotinylated antibody (Vector Laboratories) staining step in between. 
Hybridization to the probe arrays was detected by fluorometric scanning (Hewlett Packard 

30 Gene Array Scanner). Following hybridization and scanning, the microarray images were 
analyzed for quality control, looking for major chip defects or abnormalities in 
hybridization signal. After all chips passed QC, the data was analyzed using Affymetrix 
GeneChip software (v3.0), and Experimental Data Mining Tool (EDMT) software (vl.0). 
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Gene Expression Analysis 

All samples were prepared as described and hybridized onto the Affymetrix Human 
Genome U95 array. Each chip contains 16-20 oligonucleotide probe pairs per gene or 

5 cDNA clone. These probe pairs include perfectly matched sets and mismatched sets, both 
of which are necessary for the calculation of the average difference. The average difference 
is a measure of the intensity difference for each probe pair, calculated by subtracting the 
intensity of the mismatch from the intensity of the perfect match. This takes into 
consideration variability in hybridization among probe pairs and other hybridization 

10 artifacts that could affect the fluorescence intensities. Using the average difference value 
that has been calculated, an absolute call for each gene or EST is made. 

The absolute call of present, absent or marginal is used to generate a Gene 
Signature, a tool used to identify those genes that are commonly present or commonly 
absent in a given sample set, according to the absolute call. For each set of samples, a 

15 median average difference was calculated using the average differences of each individual 
sample within the set. The median average difference typically must be greater than 20 to 
assure that the expression level is at least two standard deviations above the background 
noise of the hybridization. For the purposes of this study, only the genes and gene 
fragments with a median average difference greater than 20 were further studied in detail. 

20 The Gene Signature for one set of samples is compared to the Gene Signature of 

another set of samples to determine the Gene Signature Differential. This comparison 
identifies the genes that are consistently present in one set of samples and consistently 
absent in the second set of samples. 

The Gene Signature Curve is a graphic view of the number of genes consistently 

25 present in a given set of samples as the sample size increases, taking into account the genes 
commonly expressed among a particular set of samples, and discounting those genes whose 
expression is variable among those samples. The curve is also indicative of the number of 
samples necessary to generate an accurate Gene Signature. As the sample number 
increases, the number of genes common to the sample set decreases. The curve is generated 

30 using the positive Gene Signatures of the samples in question, determined by adding one 
sample at a time to the Gene Signature, beginning with the sample with the smallest number 
of present genes and adding samples in ascending order. The curve displays the sample size 
required for the most consistency and the least amount of expression variability from 
sample to sample. The point where this curve begins to level off represents the minimum 
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number of samples required for the Gene Signature. Graphed on the x-axis is the number of 
samples in the set, and on the y-axis is the number of genes in the positive Gene Signature. 
As a general rule, the acceptable percent of variability in the number of positive genes 
between two sample sets should be less than 5%. 

5 

Fold Change analysis 

The data was first filtered to exclude all genes that showed no expression in any of 
the samples. The ratio (tumor/normal) was calculated by comparing the mean expression 
value for each gene in the breast cancer sample set against the mean expression value of that 
10 gene in the normal breast sample set. For Table 2, genes were included in the analysis if 
they had a fold change > 3 in either direction, and a p-value < 0.05 as determined by a two- 
tail unequal variance t-test. Out of the -60,000 genes surveyed by the Human Genome U95 
set, 802 genes were present in the overall fold change analysis 

1 5 Expression Profiles of Genes Differentially Expressed in Breast Cancer 

Using the above described methods, genes that were predominantly over-expressed 
in breast cancer, or predominantly under-expressed in breast cancer were identified. Genes 
with consistent differential expression patterns provide potential targets for broad range 
diagnostics and therapeutics. For simplicity, applicants examined known genes by 

20 hierarchical cluster analysis developed by Eisen and colleagues to determine if functionally 
related genes would cluster together (see Eisen, et al Proc Natl Acad Sci USA 95, 14863- 
14868 (1998)). 

Table 2 lists the genes determined to be differentially expressed in cancerous breast 
tissues compared to normal breast tissue, with the fold change value for each gene. These 

25 genes or subsets of these genes comprise an overall breast cancer gene expression profile. 

The well-characterized proliferation marker for breast cancer KI-67 had an average- 
fold change value of 2.8, which was calculated from 15 IDC tissue samples analyzed (see 
Gerdes, Semin Cancer Biol 1, 199-206 (1990)). As the fold change was below the present 3 
fold criteria, the fold change value was not presented in Table 2. Some genes previously 

30 shown to be over or under expressed in breast cancer as indicated from the literature such as 
cytokeratins 5, 1 4, 1 5, 1 7, maspin, MMP 9 and 1 1 , fibronectin, and pituitary tumor 
transforming 1, etc. are displayed in Table 2 as well as some genes such as p57(kip2), 
p63/p51/KET, mitosin, and pCDC55 whose expression levels were not previously known to 
vary in breast cancer. 
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The pituitary-tumor transforming 1 gene has been shown to produce in vitro and in 
vivo tumor-inducing activity (see Zhang et al Mol Endocrinol 13, 156-66 (1999). In a 
recent publication, pituitary-tumor transforming 1 has been shown to be over-expressed in 
mammary adenocarcinomas (see Saez et al Oncogene 18, 5473-6 (1999)). Also, another 
5 study discovered that all 48 colon carcinomas examined over-expressed PTTG1 as 

compared to normal colorectal tissue, and invasion of the surrounding tissue was associated 
with higher PTTG1 expression levels (see Heaney et al Expression of pituitary-tumour 
transforming gene in colorectal tumours [see comments] Lancet 355, 716-9 (2000)). 

Genes listed in Table 2, not reported in the literature to be over-expressed in human 
10 breast cancer tissues, include RAD2, FLS353, CKS2, cyclin-selective ubiquitin carrier 
protein E2-C, ZWINT, Lamin Bl and H2A.X. Although FLS353 has been recently found 
to be over-expressed in colorectal cancer (see Hufton et al FEBSLett 463, 77-82 (1999)), 
differential expression of FLS353 in breast tumor cells had not been previously 
demonstrated. 

1 5 Cyclin-ubiquitin carrier protein E2-C is another gene over-expressed in breast 

cancer, which was discovered in this study. Previous studies have shown that when a 
dominant-negative form of the protein is over-expressed, the mammalian cells arrested in M 
phase and evidence was provided indicating that this mutant form of cyclin-ubiquitin carrier 
protein E2-C blocked the destruction of both cyclin A and B (see Townsley et al,Proc Natl 

20 Acad Sci USA 94, 2362-7 (1997)). 

The expression levels of the genes in Tables 4 and 5 are associated with various 
stages of infiltrating ductal carcinoma (Table 4) or infiltrating lobular carcinoma (Table 5). 
The Tables present the fold change value of expression in the particular disease state 
compared to normal breast tissue. The genes in these tables may be used alone, or in 

25 combination with those listed in Tables 1 -3 in the methods, compositions, databases and 
computer systems of the invention. 

Example 2: Diagnostic Subset of Breast Cancer Associated Genes 

Table 1 lists the members of a diagnostic subset of genes selected by p-value. This 
30 group of genes can be used to differentiate between normal/benign and breast tumor tissue 
samples including two DCIS samples. Assays using these genes are capable of 
distinguishing between normal and tumor samples with near, 100% efficiency (see Figure 6). 
Only 1 of the 33 samples shown was misclassified as a normal sample based on the gene 
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expression profile when this set of genes was used to analyze the 33 sample set (see Figure 
7). 

Figures 6 and 7 are three-dimensional plots displaying the relationship of variance 
derived from gene expression data obtained from patient samples. In Figure 6, normal 

5 tissue samples are displayed as darker spheres and the infiltrating ductal carcinoma tissue 
samples are the lighter spheres. The x-axis represents the first principal component that 
contains the greatest variance in data of 80%. The y-axis represents the second principal 
component of 4%. The z-axis represents the third principal component of 3%. Figure 7 
displays the results obtained from a separate 33 sample set which is composed of new 

10 samples that have no relation to the 28 sample set utilized to discover the gene set of Table 
1. Again, the x, y, and z-axes represent the first (63%), second (10%), and third principal 
components (6%), respectively. 

The gene set of Table 1 can thus be used to distinguish normal from cancerous 
breast tissue. 

15 

Example 3: Myoepithelial and Luminal Cell Marker Genes Examined on a Global Scale 

Previous studies have indicated that myoepithelial cells express both epithelial and 
smooth muscle gene expression markers while luminal epithelial cells fail to express these 
genes (see Lazard et al, Proc Natl Acad Sci USA 90, 999-1003 (1 993)). Cluster analysis 

20 identified a group 35 fragments representing 31 genes into one highly correlative cluster and 
the combination of genes and ESTs are listed in Table 3. 

Previous studies have indicated that calponin and myosin heavy chain are expressed 
in smooth muscle cells and myoepithelial cells while luminal epithelium lack the expression 
of these genes. Furthermore, the proteins are usually not expressed in invasive ductal 

25 carcinoma of the breast (Lazard, et a/., supra). Both calponin (fold change -1 1) and myosin 
heavy chain (fold change -10.8) were under-expressed in IDC. As indicated in Table 3, 
other genes associated with smooth muscle that were under-expressed such as smooth 
muscle gamma-actin, myosin light chain kinase, myosin, heavy polypeptide 1 1, and 
Leiomodin 1 and both mysoin polypeptide 1 1 and leiomodin 1 have not been previously 

30 reported to be under-expressed in breast cancer as compared to normal tissue samples. 
The expression pattern represented in this particular cluster indicates that a 
preponderance of tissue samples diagnosed as infiltrating ductal carcinoma exhibit a luminal 
phenotype while myoepithelia] cells were absent. More evidence to support this finding 
includes the under-expression of cytokeratins 5, 14, 15, and 17 in the tumor samples as 
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shown in Table 3. Normal myoepithelial cells express cytokeratins 5, 14, 15, and 17 and 
breast carcinoma cells do not (Trask et al Proc Natl Acad Sci USA 87, 2319-2323 (1990)). 
A previous study has indicated that myoepithelial cells are present in normal, benign 
lesions, grade I infiltrating ductal carcinoma but are absent in carcinomas of grades II and 

5 IE (Gusterson et al Cancer Res 42, 4763-4770 (1 982)). 

In addition, components of the basal lamina such as laminin were under-expressed in 
the infiltrating ductal carcinoma relative to normal tissue samples (Table 3). Both laminin S 
B3 and laminin-related protein were under-expressed as indicated in Table 3. It has been 
reported that myoepithelial and basal lamina markers are useful in differentiating 

10 microinvasive from ductal carcinomas of the breast (Damiani et al Virchows Arch 434, 
227-234(1999)). 

The set of 35 fragments representing 3 1 genes as shown in Table 3 could distinguish 
between intraductal carcinoma and microinvasive DCIS tissue samples based on the 
disappearance of genes expressed in either basal lamina or myoepithelial cells. There is 
15 evidence in the literature that the collapse of the basement membrane as well as the 

disappearance of an intact myoepithelial cell layer occurs during the invasion process. A 
multi-gene screen utilizing either of these sets of genes can be used to differentiate between 
benign and invasive breast neoplasm based on the gene expression fingerprint elucidated in 
this study. 

20 Figure 8 shows the results of PCA of the 91 sample set with all 35 fragments 

(representing 3 1 genes and ESTs) in Table 3. These results demonstrate that PCA using the 
genes in Table 3 is able to distinguish between non-invasive and invasive breast tissue 
samples. Figure 8 provides evidence that this group of genes is diagnostically useful for 
differentiating DCIS samples that are intraductal (non-invasive) from those containing 

25 microinvasion. As shown in Figure 8, this group of genes and ESTs is capable of 
differentiating between two subtypes of DCIS and may constitute a set that is a more 
sensitive predictor of a microinvasion phenotype. 

Example 4: Discovery of Breast Tissue Specific Genes in IDC 
30 Electronic northern (E-northern) analysis determines if a gene of interest is present 

in a tissue from a database of gene expression information, and if it is present, then at what 
levels. Expression levels were determined using a GeneChip set that evaluated the 
expression levels of 60,000 genes in each type of tissue from 28 different normal human 
tissues. Similar to multi-tissue northern blot analysis, E-northem analysis quickly 
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determines if a gene of interest is expressed in a particular tissue type and also at what level. 
E-northem analysis of multiple tissue samples can be evaluated and the determination of 
exactly how many samples of a particular group that express the gene of interest is tabulated 
and statistical analysis can be implemented. Multiple samples from the same tissue are not 
5 available at this time using conventional multi-tissue northern blot analysis. 

The E-northern analysis can be used as a tool to discover tissue specific candidate 
therapeutic targets that are not over-expressed in tissues such as the liver, kidney, or heart. 
These tissue types often lead to detrimental side effects once drugs are developed and a 
first-pass screen to eliminate these targets early in the target discovery and validation 
10 process would be beneficial. Furthermore, different tissues have very unique gene 

expression profiles related to parameters such as proliferation, differentiation, or cell types 
contained in the tissue that can provide interesting clues into the biological roles of the 
ESTs. 

E-northern analysis was performed for many of the genes clustered in Table 2. 

15 Analysis of the E-northerns revealed that most of the genes were expressed at elevated 
levels in the thymus. There is high rate of mitosis present in the thymus during T- 
lymphocyte maturation and many proliferation-associated genes are expressed at elevated 
levels such as CDC2, cyclin Bl, and topoisomerase II alpha. Figure 1 displays the E- 
northern analysis for topoisomerase II alpha indicating elevated levels of expression in the 

20 thymus as compare to the other tissue types detected. Figure 2 shows the results of an E- 
Northern analysis of transcription factor ICBP90, implicated to be involved with 
topoisomearse II alpha expression. ICBP90 was also expressed at high levels relative to the 
other tissue types in the thymus (Figure 2). A study by Hopfher et al. indicated that adult 
thymus and fetal thymus contained the highest levels of ICBP90 using a 50-tissue RNA dot 

25 blot protocol (Hopfher et al Cancer Res 60, 121-128 (2000)). Most of the genes contained 
in this cluster contained the highest levels of expression in the thymus. 

Figure 3 shows the results of an E-Northern analysis of the monocarboxylate 
transporter 4 (MCT4; formerly known as MCT3) which was grouped with genes associated 
with proliferation. MCT4 is most evident in cells with a high glycolytic rate such as 

30 muscle, white blood cells, and tumor cells (Halestrap et al, Biochem J 343 (Pt 2), 281-299 
(1999)). A group of multi-tissue northern blots from a recent publication indicate that 
MCT4 is expressed at high levels in leukocytes but also other tissue types as well (Price et 
al, Biochem J 329, 321-328 (1998)). Furthermore, electronic-northern analysis indicated 
high levels of MCT4 were expressed in blood and white blood cells (Figure 3). 
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A previously uncharacterized gene only expressed in breast tissue was identified 
from this study and an E-Northern analysis of the expression pattern of this gene is shown 
in Figure 4. The distribution pattern of the expression of the gene shows it be used as a 
marker for breast cancer. The E-northern analysis only displays tissues where the gene of 
5 interest is present at detectable levels and breast tissue was the only tissue that this 

particular gene was under-expressed by -4.2 fold in IDC making it particularly useful as a 
diagnostic marker. 

Another gene that may be used as a diagnostic marker that was not present in a 
particular cluster is the secreted frizzled-related protein 1. This gene was under-expressed 
10 in IDC by -17.7 fold, and the E-northern analysis shown in Figure 5 indicates that it was 
expressed at greatest levels in breast tissue as well as in the cervix. Using the combination 
of clustering, fold-change analysis, and E-northern analysis on microarray data one skilled 
in the art can readily select additional therapeutic and diagnostic markers. 



Although the present invention has been described in detail with reference to 
examples above, it is understood that various modifications can be made without departing 
from the spirit of the invention. Accordingly, the invention is limited only by the following 
claims. All cited patents and publications referred to in this application are herein 
20 incorporated by reference in their entirety. 
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Table 5: BREAST / INFILTRATING LOBULAR CARCINOMA 



# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SH and SHI 


1 


7 


AA017070 


218.33+/-195.52 
77.15+/-79.13 
Nl=40, N2=17 
Fold Change: 2.53 
P-value: .00187 


X 


X 


2 


15 


AA031790 


336.45+/-181.35 
156.08+/-81.33 
Nl=40, N2=17 
Fold Change: 2.16 
P-value: .00003 


X 


X 


3 


23 


AA044830 


387.92+/-190.91 
188.55+/-88.55 
Nl=40, N2=17 
Fold Change: 2.14 
P-value: .00023 


X 


X 


4 


24 


AA045145 


262.21+/-180.28 
76.07+/-123.14 
Nl=40, N2=17 
Fold Change: 3.26 
P-value: .00038 


X 


X 


5 


25 


AA046457 


. 254.96+/-154.86 
128.89+/-1 18.57 
Nl=40, N2=17 
Fold Change: 2.3 
P-value: .00176 


X 


X 


6 


31 


AA059396 


383.25+/-127.97 
170.7+/-70.05 
N1=40,N2=17 
Fold Change: 2.32 
P-value: 0 


X 


383.25+/-127.97 . 
120.28+/-48.53 
N1=40,N2=17 

Fold Change: 3.22 
P-value: .01218 


7 


33 


AA059458 


74.76+/-90.6 
314.12+/-111.83 
Nl=40, N2=17 
Fold Change: 5.79 
P-value: 0 


X 


74.76+/-90.6 
344.29+A46.75 
Nl=40, N2=17 
Fold Change: 6.82 
P-value: 0 


8 


41 


AA126704 


312.64+/-137.34 
130.96+/-82.96 
N1=40,N2=17 
Fold Change: 2.5 
P-value: .00009 


X 


X 


9 


42 


AA127718 


240.21+/-361.64 
75.73+/-121.03 
N1=40,N2=17 
Fold Change: 3.09 
P-value: .00005 


X 


X 


10 


43 


AA127727 


2 12.97+/- 123.48 
100.07+/-53.82 
Nl=40, N2=17 

Fold Change: 2.1 
P-value: .00014 


X 


X 


11 


51 


AA133248 


400.91+/-134.73 
201. 52+/-1 19.8 
N1=40,N2=17 

Fold Change: 2.24 
P-value: .00009 


X 


X 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SH and SHI 


12 


57 


AA1 42913 


302.34+/-222.83 
104.53+/-62.4 
Nl=40, N2=17 
Fold Change: 2.84 
P-value: 0 


X 


302.34+/-222.83 
68.29+/-31.77 
N1=40,N2=17. 
Fold Change: 4.03 
P-value: .00871 \ 


13 


62 


AA147751 


478.2+/-207.42 
245.52+/-144.78 
Nl=40, N2-17 
Fold Change: 2.03 
P-value: .00015 


X 


X 


14 


63 


AA147884 


46.86+/-55.16 
212.3+/-151.24 
N1=40,N2=17 
Fold Change: 3.93 
P-value: .00001 


X 


X 


15 


64 


AA149312 


374+/-139.43 
179.7+/-77.1 
Nl=40, N2=17 
Fold Change: 2.18 
P-value: .00003 


X 


X 


16 


65 


AA1 50501 


215.8+/-104 
97.75+A48.53 
N1=40,N2=17 
Fold Change: 2.27 
P-value: .00006 


X 


X 


17 


71 


AA1 58731 


287.72+/-241.22 

94.7647-99 
N1=40,N2=17 
Fold Change: 3.29 
P-value: .00036 


X 


X 


18 


72 


AA160156 


630.23+/-274.77 
297.85+M66.73 
Nl=40, N2=17 
Fold Change: 2.39 
P-value: .00076 


X 


X 


19 


75 


AA173572 


368.73+/-173.58 

140.6+/-66.1 
Nl-40, N2=17 
Fold Change: 2.52 
P-value: .00001 


X 


368.73+/-173.58 
101.84+/-30.25 
Nl=40, N2=17 
Fold Change: 3.17 
P-value: .00053 


20 


84 


AA203663 


288.39+A92.75 
151.54+/-90.12 
N1=40,N2=17 
Fold Change: 2.19 
P-value: .00062 


X 


X 


21 


88 


AA227778 


254.32+/-164.5 
129.32+/-121.52 
Nl=40, N2-17 
Fold Change: 2.21 
P-vahie: .00551 


X 


X 


22 


99 


AA369887 


326.24+A259.48 
1569.71+/-1564.61 

Nl=40, N2=17 
Fold Change: 3.13 

P-value: .00723 


X 


X 
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# 


SeqID 


. Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SE and SHI 


23 


117 


AA430314 


259.57+/-1 86.05 
94.12+/-84.62 
Nl=40, N2=17 

Fold Change: 2.81 
P-value: .00057 


X 


259.57+/- 186.05 
51.43+/-32.49 
Nl=40, N2=17 

Fold Change: 4.33 
P-value: .0109 


24 


120 


AA447015 


226.67+/-173.74 
86.47+/-87.06 
Nl=40, N2=17 
Fold Change: 2.44 
P-value: .00239 


X 


226.67+/-173.74 
49.75+A49 

Nl=40, N2=17 
Fold Change: 3.69 

P-value: .04932 


25 


121 


AA448195 


82.22+/-92.11 
252.38+A226.28 

Nl=40, N2-17 
Fold Change: 2.56 

P-value: .00561 


X 


X 


26 


122 


AA450090 


285.47+/-226.15 
121.51+/-105.64 
N1=40,N2=17 
Fold Change: 2.56 
P-value: .00017 


X 


285.47+/-226.15 

80.74+/-67.6 
Nl=40, N2=17 
Fold Change: 3.67 
P-value: .04277 ! 


27 


124 


AA452295 


220.36+/-1 16.43 
43.55+A34.23 
Nl-40, N2-17 
Fold Change: 4.8 
P-value: 0 


X 


220.36+/- 1 16.43 

27.93+A7.95 
N1=40,N2~17 
Fold Change: 6.64 
P-value: 0 


28 


129 


AA479033 


105.96+/-264.08 
699.96+/-1244.37 

N1=40,N2=17 
Fold Change: 3.25 

P-value: .01862 


X 


X 


29 


131 


AA480075 


331.5+/-159.34 
170.51+/-174.22 
N1=40,N2=17 
Fold Change: 2.36 
P-value: .00065 


X 


X 


30 


134 


AA486731 


417. 18+/-21 6.76 
258J8+/-279.38 

Nl=40, N2=17 
Fold Change: 2.26 

P-value: .0077 


X 


X 


31 


135 


AA488889 


298.86+/-194.94 
114.61+/-41.42 
N1=40,N2=17 

Fold Change: 2.16 
P-value: .00001 


X 


X 


32 


138 


AA502943 


439.24+/-1 10.96 
200.97+/-1 10.89 
Nl=40, N2=17 
Fold Change: 2.41 
P-value: 0 


X 


X 


33 


140 


AA508196 


475.57+/-315.6 
208.59+/-128.6 
Nl=40, N2-17 
Fold Change: 2.29 
P-value: .0014 


X 


X 
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SeqED 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SH and SDI 


34 


142 


AA516420 


208.7+/-209.98 
762.28+A919.5 
N1=40,N2=17 
Fold Change: 2.83 
P-value: .00199 


X 


X 


35 


151 


AA526961 


417.14+/-237.24 
139.33+A66.58 
N1=40,N2=17 

Fold Change: 2.89 
P-value: 0 


X 


X 


36 


156 


AA534456 


1130.9+/-759.82 
504.53+/-276.17 
N1=40,N2=17 
Fold Change: 2.23 
P-value: .00282 


X 


X 


37 


160 


AA535218 


322.09+/-137.43 
130.5 1+/-83.58 
N1=40,N2=17 
Fold Change: 2.69 
P-value: .00001 


X 


X 


38 


171 


AA584310 


402.55+/-323.55 
1185.08+/-725.81 
N1=40,N2=17 
Fold Change: 3.27 
P-value: .00003 


X 


X 


39 


172 


AA584403 


593.26+/-1291.79 
73.69+/- 1 13.44 
N1=40,N2=17 

Fold Change: 3.63 
P-value: .0001 


X 


593.26+/-1291.79 
46.94+A41.5 

Nl=40, N2=17 
Fold Change: 4.08 

P-value: .01967 


40 


175 


AA601511 


2941.1 1+/-4823.41 
8196.8+/-10494.86 

Nl=40, N2=17 
Fold Change: 3.59 

P-value: .04627 


X 


X 


41 


178 


AA609310 


285.39+/-160.8 
103.37+/-63.8 
Nl=40, N2-17 
Fold Change: 2.73 
P-value: .00003 


X 


X 


42 


180 


AA610522 


803+/-768.74 
2236.91+/-2047.57 

N1=40,N2=17 
Fold Change: 3.15 

P-value: .00504 


X 


803+/-768.74 
1948.9+/-1 536.5 
Nl=40, N2=17 
Fold Change: 3.65 
P-value: .04632 


43 


184 


AA621478 


398.69+/-325.12 
105.85+/-99.55 
Nl=40, N2=17 
Fold Change: 3.76 
P-value: .00002 


X 


X 


44 


189 


AA628467 


1145.06+/-502.33 
483.55+A276.22 
Nl=40, N2-17 
Fold Change: 2.59 
P-value: .00016 


X 


1145.06+/-502.33 
263.82+/-233.17 
N1=40,N2=17 
Fold Change: 5.48 
P-value: .04561 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SII and SUI 


45 


191 


AA631047 


615.9+/-364.24 
335.52+A248.64 
N1=40,N2=17 
Fold Change: 2.12 
P-value: .00214 


X 


X 


46 


194 


AA634799 


739.38+/-608.62 
265.99+/-273.02 
Nl=40, N2=17 
Fold Change: 3.37 
P-value: .00153 


X 


X 


47 


198 


AA669106 


84.29+/- 13 1.22 
224.41+/-230.31 
Nl=40, N2=17 
Fold Change: 3.18 
P-value: .00001 


X 


X 


48 


200 


AA700621 


467.5 1+/-455.09 
127.5+/-198.7 
N1=40,N2=17 
Fold Change: 3.36 
P-value: .00047 


X 


467.5 1+/-455.09 
65.41+/-73.63 
Nl=40, N2=17 

Fold Change: 4.6 
P-value: .03306 


49 


214 


AA742697 


1026.03+/-1071.41 
497.89+/-1362.07 

Nl=40, N2=17 
Fold Change: 3.28 

P-value: .00238 


X 


1026.03+/-1071.41 
72.76+A23.65 
Nl=40, N2=17 

Fold Change: 7.24 
P-value: 0 


50 


253 


AA921809 


459.15+/-1266.29 
1144.77+/-1 121.05 

N1=40,N2=17 
Fold Change: 2.76 

P-value: .00483 


X 


X 


51 


254 


AA921830 


92.93+/-115.1 
214.98+/-154.53 
N1=40,N2=17 
Fold Change: 2.53 
P-value: .00048 


X 


92.93+/-115.1 
32S.17+/-235.36 
Nl=40, N2=17 
Fold Change: 4.07 
P-value: .03148 


52 


255 


AA921922 


312.44+/-292.63 
10L23+/-57.27 
N1=40,N2=17 
Fold Change: 2.73 
P-value: .00001 


X 


312.44+/-292.63 

79.08+/-33.3 
Nl=40, N2=17 
Fold Change: 3.21 
P-value: .00566 


53 


260 


AA936632 


X 


X 


125.03+/-127.3 
34 1.96+/- 182.6 
Nl=40, N2=17 
Fold Change: 3.13 
P-value: .02208 


54 


266 


AA976064 


363.9+/-153.14 
150.7+/-67.67 
Nl=40, N2=17 
Fold Change: 2.48 
P-value: 0 


X 


A 


55 


281 


AC004770 


X 


X 


222.34+/-1 59.84 

51.7+/-14.58 
Nl=40, N2=17 
Fold Change: 3.51 
P-value: .00008 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SII and S1U 


56 


297 


AF052142 


307.17+/-169.55 
101.76+/-54.87 
Nl=40, N2-17 

Fold Change: 2.92 
P-value: 0 


X 


X 


57 


317 


AI018523 


422.08+/-1 87.64 
137.17+/-133.59 
Nl=40, N2=17 
Fold Change: 3.55 
P-value: .00002 


X 


X 


58 


321 


AI031771 


85.9+/-105.07 
273.1 1+/-256.97 

Nl=40, N2=17 
Fold Change: 2.82 
P-value: .00563 


X 


X 


59 


324 


AI039005 


203.54+/-131.69 
79.78+/-68.07 
N1=40,N2=17 
Fold Change: 2.7 
P-value: .00048 


X 


X 


60 


325 


AI039722 


X 


X 


1007.24+/-1 162.59 
71.46+/-83.95 
N1=40,N2=17 

Fold Change: 11.94 
P-value: .00965 


61 


331 


AI057450 


38 1.32+/- 1572.07 
-3.82+/-29.02 
Nl=40, N2=17 
Fold Change: 3.3 
P-value: .00001 


X 


381.32+/-1572.07 
-11.17+/-8.38 
Nl=40, N2=17 
Fold Change: 3.63 
P-value: 0 


62 


333 


AI073394 


124.23+/-101.36 
255.64+/-158.11 
Nl=40, N2=17 
Fold Change: 2.2 
P-value: .00025 


X 


x 


63 


335 


A1073992 


110.23+A145.3 
533.62+/-785.24 
N1=40,N2=17 
Fold Change: 3.22 
P-value: .00574 


X 


X 


64 


338 


AI079545 


248.94+/-138.38 
465.02+/-171.05 
Nl=40, N2=17 
Fold Change: 2.01 
P-value: .00007 


X 


X 


65 


341 


AI083598 


339.56+A289.33 
75.11+/-72.52 
Nl=40, N2=17 

Fold Change: 3.79 
P-value: .00003 


X 


339.56+/-289.33 
38.38+/-30.41 
Nl=40, N2=17 
Fold Change: 5.72 
P-value: .00274 


66 


342 


AI086614 


301. 2+/-1 52.86 
128.33+/-84.7 
Nl=40, N2=17 
Fold Change: 2.51 
P-value: .00041 


X 


X 
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SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SEE and Sm 


67 


343 


A1087975 


68.87+/-58.02 
211.46+/-250.57 
Nl=40, N2-17 
Fold Change: 2.28 
P-value: .00976 


X 


X 


68 


344 


A1088609 


709.25+/-600.21 
265.96+A356.75 
N1=40,N2=17 
Fold Change: 3.21 
P-value: .00094 


X 


X 


69 


345 


AI091154 


351.29+/-406.17 
74.97+/- 110.43 
N1=40,N2=17 

Fold Change: 4.1 
P-value: .00011 


X 


351.29+/-406.17 

12.49+/-5.56 
Nl=40, N2=17 
Fold Change: 8.99 
P-value: 0 


70 


351 


A1123555 


300+/-164.6 
65.25+/-46.06 
N1=40,N2=17 
Fold Change: 4.55 
P-value: 0 


. X 


300+/-164.6 
48.57+A47.56 
Nl=40, N2=17 
Fold Change: 6 
P-value: .01993 


71 


359 


AI128820 


224.42+/-90.96 
108.28+/-86.45 
Nl=40, N2=17 
Fold Change: 2.34 
P-value: .00033 


X 


X 


.72 


361 


AI129626 


278.92+/- 134. 16 
134.17+/-77.75 
Nl=40, N2=17 

Fold Change: 2.13 
P-value: .00023 


X 


X 


73 


362 


AI131078 


299.48+/-223.81 
111.16+/-71.9 
Nl=40, N2=17 
Fold Change: 2.6 
P-value: .0002 


X 


299.48+/-223.81 
67.7+/-89.93 
Nl=40, N2=17 
Fold Change: 5.06 
P-value: .04594 


74 


370 


AI148006 


24 1.1 7+/- 193.5 
77.61+/-92.82 
Nl=40, N2=17 
Fold Change: 2.93 
P-value: .00043 


X 


X 


75 


372 


AI149637 


212.6+/-241.64 
39.92+A27.3 
Nl=40, N2=17 
Fold Change: 3.37 
P-value: 0 


X 


212.6+/-241.64 
39.29+/-41.66 
Nl=40, N2=17 
Fold Change: 331 
P-value: .04204 


76 


380 


All 89011 


284.7+/-101.6 
126.14+/-81.81 
Nl=40, N2=17 
Fold Change: 2.75 
P-value: .00017 


X 


X 


77 


384 


AI200954 


524.84+A3 19.36 
253.81+/-173.45 
Nl=40, N2=17 
Fold Change: 2.17 
P-value: .00291 


X 


X 
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u 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SH and SID 


78 


386 


AI201965 


X 


X 


234.24+/- 149.37 
59.16+/-44.89 
Nl=40, N2=17 
Fold Change: 3.61 
P-value: .03602 


79 


394 


AI222594 


431.73+/-162.38 
196.71+/-138.58 
Nl=40, N2=17 
Fold Change: 2.48 
P-value: .00005 


X 


X 


80 


395 


AI223817 


221.5+/-204.3 
686.72+A465.96 
Nl«40, N2=17 
Fold Change: 3.28 
P-value: .00041 


X 


X 


81 


399 


AI247837 


250.33+/-3 14.52 
53.27+A43.26 
N1=40,N2=17 
Fold Change: 2.95 
P-value: .00014 


X 


250.33+/-314.52 
28.03+/-28.56 
N1=40,N2=17 
Fold Change: 4.49 
P-value: .00427 


82 


408 


AI277612 


1022.91+/-907.07 
10I.24+/-106.96 
Nl=40, N2=17 

Fold Change: 8.06 
P-value: 0 


X 


387.19+/-203.85 
584.56+/-51.28 
N1=40,N2=17 
Fold Change: 2.01 
P-value: .00012 


83 


417 


AI300876 


601.83+/-985.51 
26.36+A32.43 
N1=40,N2=17 
Fold Change: 7.1 
P-value: 0 


X 


601.83+/-985.51 
28.36+A46.5 
N1=40,N2=17 
Fold Change: 6.7 
P-value: .00688 


84 


418 


AI301060 


1095.7+/-461.79 
3285.81+/-2230.69 

N1=40,N2=17 
Fold Change: 2.58 

P-value: .00018 


X 


X 


85 


422 


AI333767 


201.68+/-104.32 

94.33+A75 
Nl=40, N2=17 
Fold Change: 2.32 
P-value: .00023 


X 


X 


86 


423 


AI333987 


X 


X 


208.53+/-320.79 
-12.06+/-45.78 
N1=40,N2-17 
Fold Change: 4.29 
P-value: .00037 


87 


427 


AI341602 


137.44+/-280.1 
473.63+/-503.04 

Nl=40, N2=17 
Fold Change: 372 

P-value: .00123 


X 


137.44+/-280.1 
1084.1+/-558.85 
Nl=40, N2=l7 
Fold Change: 14.07 
P-value: .00013 


88 


430 


AI344312 


85.72+/-58.03 
24 1.24+/- 132.01 
N1=40,N2=17 
Fold Change: 2.77 
P-value: .00003 


X 


X 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SII and SIH 


89 


431 


AI346341 


635.18+/-426.52 
192.7+M46.21 
Nl=40, N2=17 
Fold Change: 2.74 
P-value: .00095 


X 


X 


90 


442 


AI369840 


239.87+A 167.43 
91.16+/-73.21 
Nl=40, N2=17 
Fold Change: 2.54 
P-value: .00091 


X 


X 


91 


447 


AI378584 


815.22+/-371.96 
289.2+/- 132.28 
Nl=40, N2=17 
Fold Change: 2.65 
P-value: 0 


X 


815.22+/-37L96 
225.35+/-105.83 
Nl=40, N2=17 
Fold Change: 3.53 
P-value: .02945 


92 


448 


AI379723 


380.22+/-173.64 
171.75+/-85.82 
Nl=40, N2»17 
Fold Change: 2.11 
P-vaiue: .00049 


X 


X 


93 


459 


AI394013 


X 


X 


81.65+/-57.28 
206.8+/-28.72 
Nl=40, N2=l7 
Fold Change: 3.01 
P-value: 0 


94 


462 


AI417267 


933.35+/-487.41 
367.83+/-178.5 
Nl=40, N2=17 

Fold Change: 2.35 
P-value: 0 


X 


933.35+/-487.41 
232.02+/-44.3 
Nl=40, N2=17 
Fold Change: 3.44 
P-value: 0 


95 


467 


AI419030 


445.97+/-259.12 
141.54+/- 1 10.04 
N1=40,N2=17 
Fold Change: 3.4 
P-value: .00002 


X 


445.97+/-259.12 
100.89+/-50.85 

■KTi A f\ "K 1 *7 

Nl=40, N2=17 
Fold Change: 3.94 
P-value: .00968 


96 


468 


AI421837 


293.96+/-147.73 
122.58+/-60.8 
N1=40,N2=17 
Fold Change: 2.25 
P-value: .00003 


X 


X 


97 


477 


AI458003 


280.16+/-202.76 
58.35+A64.44 
N1=40,N2=17 
Fold Change: 4.09 
P-value: 0 


X 


280.16+/-202.76 
29.02+/-54.63 
Nl=40, N2=17 
Fold Change: 6.1 
P-value: .01261 


98 


484 


AI479262 


56.35+/-67.19 
253.01+/-258.86 
Nl=40, N2=17 
Fold Change: 3.34 
P-value: .00113 


v 
A 




99 


489 


AI492051 


382.34+/-177.78 
99.97+/-58.1 
N1=40,N2=17 
Fold Change: 3.83 
P-value: 0 


X 


382.34+A177.78 
84.79+A5836 
Nl=40, N2=17 
Fold Change: 4.59 
P-value: .01274 
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n 


SeqID 


Genbank 


Normal vs AJ1 


Normal vs Malignant 


Normal vs SH and SHI 


100 


493 


AI492879 


219.42+/-658.12 
360.39+/-664.73 
Nl=40, N2=17 
Fold Change: 3.18 
P-value: .00218 


X 


X 


101 


500 


AI524085 


388.89+/-529.52 
77.76+/- 117.23 
Nl=40, N2=17 
Fold Change: 3.83 
P-value: .00013 


X 


X 


102 


501 


AI525044 


316.89+/-143.08 
163.75+/-85.16 
Nl=40, N2=17 
Fold Change: 2.13 
P- value: .00114 


X 


X 


103 


505 


AI537407 


278.8+/-204.74 
783.29+/-533.91 
Nl=40, N2=17 
Fold Change: 2.81 
P-value: .00083 


X 


X 


104 


506 


AI539386 


1924.9+/-2430.34 
6121.55+/-7013.05 

N1=40,N2=17 
Fold Change: 3.2 

P-value: .00044 


X 


X 


105 


511 


AI554514 


90.74+/-52.8 
201. 02+/-1 66.43 

Nl=40, N2=17 
Fold Change: 2.08 

P-value: .00026 


X 


X 


106 


512 


AI557210 


129.15+/-140.98 
491.52+/-264.84 
Nl-40, N2=17 
Fold Change: 5.08 
P-value: 0 


X 


1 29. 15+/- 140.98 

573+/-162.6 
Nl=40, N2=17 
Fold Change: 6.68 
P-value: .00001 


107 


517 


AI566038 


257.62+/- 109.32 
124.43+/-63.25 
Nl=40, N2-17 
Fold Change: 2.16 
P-value: .00015 


X 


X 


108 


520 


AI571525 


265.1 1+/-78.71 
14L93+/-62.73 
Nl=40, N2=17 
Fold Change: 2.04 
P-value: .00015 


X 


X 


109 


536 


AI624853 


373.05+/-166.36 
180.19+/-106.47 
Nl=40, N2=17 
Fold Change: 2.21 
P-value: .00004 


X 


X 


110 


540 


AI634852 


278.07+/-162.92 
122.35+/-122.97 
Nl=40, N2=17 
Fold Change: 2.6 
P-value: .00095 


X 


X 
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SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SII and SIH 


111 


542 


AI638295 


X 


X 


220.74+/-876.87 

3.1+/-10.99 
Nl=40, N2=17 
Fold Change: 3.16 
P-value: 0 


112 


545 


AI650341 


123.6+/-154.23 
209.61+/-97.49 
Nl=40, N2=17 
Fold Change: 2.41 
P-value: .00028 


X 


X 


113 


546 


AI650514 


110.57+/-163.5 
295.1 1+/-242 
N1=40,N2=17 
Fold Change: 2.56 
P-value: .00744 


X 


X 


114 


562 


AI658925 


542.56+/-347.67 
259.65+/-161.58 
N1=40,N2=17 
Fold Change: 2.07 
P-value: .00351 


X 


X 


115 


565 


AI659418 


26 1.02+/- 116.11 
133.75+/-1 08.49 
N1=40,N2=17 
Fold Change: 2.41 
P-value: .00088 


X 


X 


116 


566 


AI659533 


563.4+/-201.34 
291.04+/-136.51 

N1=40,N2=17 
Fold Change: 2.1 

P-value: .00023 


X 


X 


117 


588 


AI680541 


510.08+/-201.29 
186.08+/-102.82 
N1=40,N2=17 
Fold Change: 2.84 
P-value: 0 


X 


510.08+/-201.29 
106.49+/-44.75 
Nl-40, N2=17 
Fold Change: 4.54 
P-value: .00246 


118 


591 


AI683911 


241.46+/-200.89 
27.24+/-52.93 
N1=40,N2=17 
Fold Change: 4.58 
P-value: 0 


X 


241.46+/-200.89 
32.69+/-57.65 
N1=40,N2=17 | 
Fold Change: 3.74 
P-value: .01617 


119 


592 


AI684457 


96.99+/-74.31 
253.71+/-245.09 
Nl=40, N2=17 
Fold Change: 2.25 
P-value: .00425 


X 


X 


120 


593 


AI686114 


374.48+/-274.59 
120.83+/-92.86 
Nl=40, N2=17 
Fold Change: 3.03 
P-value: .0001 


X 


374.48+A274.59 
76.06+/-83.42 
Nl=40, N2=17 
Fold Change: 4.43 1 
P-value: .04695 ! 


121 


612 


AI701034 


215.78+/-96.65 
111.85+/-71.77 
Nl-40, N2=17 
Fold Change: 2.11 
P-value: .00036 


X 


X 
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SeqID 


Genbank 


Norma! vs All 


Normal vs Malignant 


Normal vs SII and SHI 


122 


618 


AI732274 


947.08+/-989.69 
28S.99+/-458.46 
Nl=40, N2=17 
Fold Change: 3.94 
P-value: .00251 


X 


X 


123 


619 


AI733679 


325.9+/-596.22 
48.5+/-33.81 
Nl=40, N2=17 
.Fold Change: 3.21 
P-vahie: .00002 


X 


X 


124 


623 


AI740621 


231.84+/-247.13 
77.35+/-124.9 
Nl=40, N2=17 
Fold Change: 2.62 
P-value: .00315 


X 


X 


125 


627 


AI742002 


111.78+/-132.43 
379.6+/- 168.26 
Nl=40, N2-17 
Fold Change: 4.61 
P-value: 0 


X 


111.78+/-132.43 
388+/-292.79 
N1=40,N2=17 
Fold Change: 4.32 
P-value: .0111 


126 


629 


AI742239 


159.76+/- 199.32 
419.47+/-377.4 
Nl=40, N2=17 
Fold Change: 3.29 
P-value: .00013 


X 


X 


127 


631 


AI742490 


601.57+/-252.84 
285.13+/-140.07 
Nl=40, N2=17 
Fold Change: 2.05 
P-value: .00003 


X 


X 


128 


632 


AI742521 


215.93+/-234.91 
23.91 +/-22.33 
N1=40,N2=17 

Fold Change: 4.4 
P-value: 0 


X 


215.93+/-234.91 
23.3+/-12.66 
Nl-40, N2=17 
Fold Change: 4.76 
P-value: .00002 


129 


635 


AI743671 


582.82+/-317.91 
281.49+/-185.49 
Nl=40, N2=17 
Fold Change: 2.26 
P-value: .00964 


X 


X 


130 


636 


AI743715 


312.02+/-238.55 
99.48+/- 14 1.4 
Nl=40, N2=17 
Fold Change: 3.47 
P-value: .0005 


X 


X 


131 


637 


AI743925 


663.58+/-309.38 
221.31+/-142.28 
Nl=40, N2-17 
Fold Change: 3.13 
P-value: 0 


X 


X 


132 


641 


AI751438 


144.67+/-188.73 
551.05+/-364.64 
Nl=40, N2-17 
Fold Change: 4.85 
P-value: 0 


X 


144.67+/-188.73 
612.92+/-347.94 
Nl=40, N2=17 
Fold Change: 5.61 
P-value: .02877 
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SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SU and SD1 


133 


643 


AI758223 


S33.52+/-665.83 
89.52+A74.43 
Nl=40, N2=17 
Fold Change: 8.3 
P-value: 0 


X 


833.524A665.83 
98.814/-90.66 
Nl=40, N2=17 
Fold Change: 8 
P-value: .02464 


134 


649 


AI761241 


883.347-332. 12 
415.6447-208.2 
Nl=40, N2=17 
Fold Change: 2.21 
P-value: .00005 


X 


X 


135 


650 


AI761274 


342.3647-182.65 
121.18+/-64.61 
Nl=40, N2=17 
Fold Change: 2.86 
P-value: .00001 


X 


342.364/-182.65 
75.25+A39.87 
Nl=40, N2=17 

Fold Change: 4.5 
P-value: .01949 


136 


652 


AI761844 


278.83+/-138.41 
99.54+/-56.16 
Nl=40, N2=17 
Fold Change: 2.79 
P-value: .00001 


X 


278.83+/-138.41 
87.16+/-56.51 
Nl=40, N2=17 

Fold Change: 3.1 
P-value: .02791 


137 


653 


AI763136 


282.1+/-149.81 
118.7+/-131.83 
Nl=40, N2-17 
Fold Change: 2.53 
P-value: .00163 


X 


X 


138 


655 


AI766029 


271.74+/-528.19 
22.1147-18.39 
Nl=40, N2=17 
Fold Change: 3.71 
P-value: 0 


X 


271.74+/-528.19 
30.31+/-29.22 
Nl=40, N2=17 

Fold Change: 3.07 
P-value: .01978 


139 


657 


AI768325 


114.7+/-66.43 
257.5147-172.22 
Nl=40, N2-17 
Fold Change: 2.12 
P-value: .00044 


X 


X 


140 


664 


AI791182 


286.48+/-162.61 
621.0747-388.18 
Nl=40, N2=17 
Fold Change: 2.07 
P-value: .00052 


X 


X 


141 


668 


AI792635 


X 


X 


800.244/-717.81 
1968.88+/-866 
N1=40,N2=17 

Fold Change: 4.27 
P-value: .0038 


142 


674 


AI797276 


271.484/-136.73 
106.25+/-58.1 
Nl=40, N2=17 
Fold Change: 2.56 
P-value: .00001 


X 


271.484/-136.73 
76.49+/-46.61 

XT 1 A f\ XTO 1 *7 

Nl=40, N2=l / 
Fold Change: 3.58 
P-value: .02759 


143 


678 


AI799784 


603.99+/-383.42 
93.05+/-88.68 
Nl=40, N2=17 
Fold Change: 6.66 
P-value: 0 


X 


603.99+/-383.42 
82.714/-7829 
Nl-40, N2=l7 

Fold Change: 7.34 
P-value: .01379 
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n 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SEE and SHI 


144 


684 


AI804054 


302.97+A234.41 
108.18+/-91.04 
Nl=40, N2=17 
Fold Change: 2.83 
P-value: .0001 1 


X 


302.97+/-234.41 
77.58+A36.68 
N1=40,N2=17 
Fold Change: 3.3 
P-value: .01862 


145 


687 


AI806324 


21I.46+/-131.17 
108.84+/-79.43 
N!=40, N2-17 
Fold Change: 2.03 
P-value: .00874 


X 


X 


146 


691 


AI809953 


383.43+/- 189.32 
120.52+M00.18 
N1=40,N2=17 
Fold Change: 3.27 
P-value: .00013 


X 


X 


147 


693 


AI810266 


68.88+/-106.64 
76 1.49+/- 1126.65 

Nl=40, N2=l7 
Fold Change: 6.3 

P-value: .000 13 


X 


X 


148 


694 


AI810764 


202.1 6+/-159.83 
1084.09+/-1401.59 

Nl=40, N2=17 
Fold Change: 4.41 

P-value: .00007 


X 


X 


149 


701 


A1816835 


360.85+/-289.77 
171.05+/-158.66 
Nl=40, N2=17 
Fold Change: 2.13 
P-value: .00229 


X 


X 


150 


704 


AI817967 


X 


X 


112.71+M 18.41 
308.86+/-160.78 
Nl=40, N2=17 
Fold Change: 3.45 
P-value: .00951 


151 


706 


AI818579 


394.08+/-228.07 
204.91+M97.94 
N1=40,N2=17 
Fold Change: 2.13 
P-value: .00391 


X 


X 


152 


712 


A1821472 


519.1 1+/-694.13 
-5.59+/-2 18.89 
Nl=40, N2=17 
Fold Change: 5.69 
P-value: .00005 


X 


519.11+/-694.13 
-49.74+/-70.96 
Nl=40, N2=17 
Fold Change: 9.33 
P-value: 0 


153 


713 


AI823572 


232.21+/- 195.63 
9L57+/-60.62 
Nl=40, N2=l7 

Fold Change: 2.43 
P-value: .00008 


X 


X 


154 


721 


AI825936 


229.86+/-148.12 
98.58+/-81.47 
Nl=40, N2=l7 
Fold Change: 2.58 
P-value: .00016 


X 


X 
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u 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SB and SHI 


155 


722 


AI826437 


45.86+A1 18.99 
281.35+A448.25 
Nl=40, N2=17 
Fold Change: 3.03 
P-value: .0122 


X 


X 


156 


744 


AI863167 


183.76+A73.48 
406.96+A190.24 

Nl=40, N2=17 
Fold Change: 2.16 
P-value: 0 


X 


X 


157 


747 


AI864898 


401.86+A258.51 
75.46+A68.5 
N1=40,N2=17 
Fold Change: 5.61 
P-value: 0 


X 


X 


158 


750 


AI871044 


766.39+/-500.99 
189.5+/- 179.55 
Nl=40, N2=17 

Fold Change: 4.03 
P-value: .00001 


X 


766.39+A500.99 
84.85+A70.19 
Nl=40, N2=17 
Fold Change: 8.12 
P-value: .00884 


159 


751 


AI872267 


267.23+/-203.1 
627.26+/-368.25 
N1=40,N2=17 
Fold Change: 2.55 
P-value: .00015 


X 


X 


160 


752 


AI879337 


431.51+/-184.18 
215.5+A1 15.86 
Nl=40, N2=17 

Fold Change: 2.18 
P-value: .00062 


X 


X 


161 


758 


AI888322 


X 


X 


319.22+A320.74 
71.54+A51.15 
N1=40,N2=17 
Fold Change: 3.78 
P-value: .03277 


162 


772 


AI916544 


151.27+A163.24 
373.43+A334.2 
N1=40,N2=17 
Fold Change: 2.45 
P-value: .00524 


X 


X 


163 


775 


AI917901 


601.53+/-812.45 
76.98+/- 13 1.25 
Nl=40, N2=17 
Fold Change: 4.95 
P-value: .00005 


X 


601.53+A812.45 
26.66+A20.01 
Nl=40, N2=17 
Fold Change: 7.3 
P-value: .00001 


164 


780 


AI924465 


448.27+A478.27 
149.48+A1 15.97 
Nl=40, N2=17 
Fold Change: 2.43 
P-value: .00214 


X 


X 


165 


787 


AI934361 


220.01+/-243.16 
54.43+/-44.52 
Nl=40, N2=17 

Fold Change: 3.1 
P-value: .00001 


X 


220.01+A243.16 
52.02+A37.1 
Nl=40, N2=17 
Fold Change: 3.01 
P-value: .03711 
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# 


SeqlD 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SD and SIII 


166 


789 


AI934881 


316.72+/-226.37 
659.59H-/-486.96 
Nl=40, N2=17 
Fold Change: 2.01 
P-value: .00378 


X 


X 


167 


816 


AI968151 


127.39+/-61.78 
376.92+A292.97 
N1=40,N2=17 
Fold Change: 2.53 
P-value: .00031 


X 


X 


168 


817 


AI968379 


295.46+/-388.02 
-8.49+A25.52 
Nl=40, N2=17 
Fold Change: 6.27 
P-value: 0 


X 


295.46+/-388.02 

.59+A34.6 i 
N1=40,N2=17 
Fold Change: 5.43 
P-value: .00032 


169 


818 


AI968904 


738.79+A292.65 
307.62+/-1 19.37 
Nl=40, N2=17 
Fold Change: 2.35 
P-value: 0 


X 


X 


170 


830 


AI972498 


286.51+/-1 12.64 
135.46+/-66.44 
Nl-40, N2=17 
Fold Change: 2.18 
P-value: .00003 


X 


X 


171 


832 


AI972873 


436.16+/-215 
132.01+/-99.1 
N1=40,N2=17 
Fold Change: 3.85 
P-value: 0 


X 


X 


172 


838 


AI983045 


281.02+/-338.08 
40.45+/-1 25.75 
Nl=40, N2-17 

Fold Change: 4.78 
P-value: 0 


X 


281.02+/-338.08 
-9.19+/-15.96 
Nl-40, N2=17 
Fold Change: 7.52 
P-value: 0 


173 


857 


AL037805 


X 


X 


614.2+/-317.15 
183.89+/-87.99 
N1=40,"N2=17 
Fold Change: 3.13 
P-value: .01435 


174 


865 


AL040912 


304.56+/-132.78 
112.19+/-70.33 
N1=40,N2=17 
Fold Change: 2.8 
P-value: .00006 


X 


X 


175 


867 


AL042492 


809.69+/-853.09 
72.75+A93.44 
N1=40,N2=17 
Fold Change: 9.48 
P-value: 0 


X 


1022.91+/-907.07 
85.76+/-67.41 

Nl=40, N2=17 
Fold Change: 8.09 

P-value: .00176 


176 


876 


AL046941 


428.58+A238.89 
146.79+/-176.57 
Nl=40, N2=17 
Fold Change: 4.06 
P-value: .00007 


X 


428.58+A238.89 

55.32+/-48.1 
N1=40,N2=17 
Fold Change: 7.58 
P-value: .01267 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SU and SHI 


177 


881 


AL048962 


944+/-354.29 
399.3+A2 11.63 
Nl=40, N2=17 
Fold Change: 2.5 
P-value: .00001 


X 


944+A354.29 
289.62+/- 184.81 
Nl=40, N2=17 
Fold Change: 3.52 
P-value: .03411 


178 


893 


AL050367 


257.59+A77.75 
111.77+/-59.21 
Nl=40, N2=17 
Fold Change: 2.47 
P-value: 0 


X 


257.59+A77.75 
76.12+/-36.74 
Nl-40, N2=17 
Fold Change: 3.45 
P-value: .01201 


179 


894 


AL079279 


313.49+/-189.76 
127.56+/-77.14 
Nl-40, N2=17 
Fold Change: 2.4 
P-value: .00036 


X 


X 


180 


896 


AL079707 


261.69+/-226.08 
73.98+A35.27 
N1=40,N2=17 
Fold Change: 3.16 
P-value: 0 


X 


X 


181 


902 


AL1 18746 


234.63+/-1 13.05 
84J7+/-47.29 
Nl=40, N2=17 

Fold Change: 2.86 
P-value: .00001 


X 


234.63+/- 11 3.05 
46.19+/-34.43 
N1=40,N2=17 

Fold Change: 5.14 
P-value: .0179 


182 


905 


AW000952 


98.9+A72.25 
204.67+/-105.21 
Nl=40 5 N2=17 
Fold Change: 2.16 
P-value: .00011 


X 


X 


183 


907 


AW002846 


283.14+/-201.6 
119.62+/-87.38 
Nl=40, N2-17 
Fold Change: 2.43 
P-value: .00065 


X 


X 


184 


908 


AW002941 


959.64+/-342.08 
493.25+/-243.21 
N1=40,N2=17 
Fold Change: 2.13 
P-value: .0001 


X 


X 


185 


916 


AW006235 


346.9+/-210.26 
121.01+/-58.03 
Nl=40, N2=17 
Fold Change: 2.69 
P-value: 0 


X 


X 


186 


917 


AW006352 


235.29+/- 179. 11 
534.97+/-420.56 
N1=40,N2=17 
Fold Change: 2.17 
P-value: .00953 


X 


X 


187 


921 


AW007080 


223.2+/-1 16.87 
69.24+/-50.48 
Nl=40, N2=17 
Fold Change: 3.27 
P-value: .00001 


X 


223.2+/-1 16.87 
36.39+/-14.01 
N1=40,N2=17 
Fold Change: 5.16 
P-vahie: .0001 
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# 


SeqED 


Gen bank 


Normal vs All 


Normal vs Malignant 


Normal vs S1I and SHI 


188 


926 


AW007803 


153.39+/-142.06 
442.5+A397.54 
Nl=40, N2=17 
Fold Change: 2.55 
P-vahie: .00867 


X 


X 


189 


931 


AW014155 


214.48+/-209.56 
624.36+A372.34 
N1=40,N2=17 
Fold Change: 3.15 
P-value: .00005 


X 


X 


190 


953 


AW051492 


442.65+A332.99 
203.39+/-140.38 
Nl=40, N2-17 
Fold Change: 2.22 
P-value: .00151 


X 


X 


191 


957 


C17781 


229.36+/- 14 1.71 
84.23+/-69.19 
Nl=40, N2=17 
Fold Change: 2.59 
P-value: .00012 


X 


X 


192 


975 


F22640 


416.82+/-153.5 
204.94+/-169.19 
Nl=40, N2-17 
Fold Change: 2.37 
P-value: .00007 


X 


X 


193 


985 


HI 6568 


288.53+/-212.27 
74.99+A76.74 
N1=40,N2=17 
Fold Change: 3.32 
P-value: .00019 


X 


288.53+A2 12.27 
32.47+A46.57 
N1=40,N2=17 
Fold Change: 5.01 
P-value: .01332 


194 


988 


H30384 


194.93+/-133.51 
479.18+/-480.95 
Nl=40, N2=17 
Fold Change: 2.18 
P-value: .00329 


x 


X 


195 


992 


H54254 


377.04+/-687.01 
38.27+/-23.01 
N1=40,N2=17 
Fold Change: 4.25 
P-value: 0 


X 


377.04+/-687.01 
36.82+A32.95 
N1=40,N2=17 
Fold Change: 4.51 
P-value: .00966 


196 


997 


H92988 


390.91+/-149.13 
205.04+/-140.06 












N1=40,N2=17 
Fold Change: 233 
P-value: .00168 


X 


X 


197 


1074 


N42752 


63.77+/-48.02 
291.54+/-224.99 
Nl=40, N2=17 
Fold Change: 3.86 
P-value: .00006 


X 


X 


198 


1085 


N56877 


109.5+/-80.79 
402.12+/-388.61 
Nl=40, N2=17 
Fold Change: 3 
P-value: .00087 


X 


X 



WO 02/059271 



PCT/US02/02176 



244 



n 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SIT and SHI 


199 


1090 


N63913 


458.01+/-316.71 
67.39+/-79.5 
Nl=40, N2=17 
Fold Change: 6.57 
P-vaiue: 0 


X 


458.01+/-316.71 ! 
8.79+/-40.55 
Nl=40, N2=17 
Fold Change: 11.59 . 
P-value: .00004 


200 


1101 


R08000 


502.76+/-694.51 
82.03+/-53.39 
N1=40 > N2=17 
Fold Change: 3.78 
P-value: 0 


X 


502.76+/-694.51 j 
90.53+/-93.25 
Nl=40, N2=17 
Fold Change: 4.05 
P-value: .04964 


201 


1104 


R20784 


1112.78+/-843.96 
359.34+A233.36 
Nl=40, N2=17 
Fold Change: 2.91 
P-value: .00005 


X 


. X 


202 


1105 


R39938 


111.89+/-67.41 
222.22+/-1 11.16 
Nl=40, N2=17 
Fold Change: 2.12 
P-value: .00002 


X 


X 


203 


1106 


R42575 


90.17+/-38.15 
215.36+M56.29 
Nl=40, N2=17 
Fold Change: 2.01 
P-value: .00211 


X 


X 


204 


1112 


R54660 


200.26+M33.86 
48.69+/-33.36 
Nl=40, N2=17 
Fold Change: 3.43 
P-value: 0 


X 


200.26+/-133.86 
29.39+/-27.33 
N1=40,N2=17 

Fold Change: 4.69 
P-value: .0025 


205 


1116 


R70255 


24 1.29+/- 18 1.34 
14.29+/-38.71 
Nl=40, N2=17 
Fold Change: 5.79 
P-value: 0 


X 


24 1.29+/- 18 1.34 
-8.34+/-15.47 
Nl-40, N2=17 
Fold Change: 7.98 
P-value: 0 


206 


1118 


R74561 


425.23+/-350.96 
879.43+/-654.71 
Nl=40, N2=17 
Fold Change: 2.16 
P-value: .0019 


X 


X 


207 


1119 


R83604 


304.76+/-867.74 
-32.63+/-64.18 
Nl=40, N2-17 
Fold Change: 3.15 
P-value: .00017 


X 


X 


208 


1125 


T61106 


180.38+/- 114.3 
349.03+/-164.74 
Nl=40, N2-17 
Fold Change: 2.35 
P-value: .00001 


X 


X 


209 


1132 


T85314 


X 


X 


166.2+/-1 16.99 
644.58+/-401.95 

Nl=40, N2-17 
Fold Change: 4.09 

P-value: .03546 
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SeqlD 


Genbank 


Normal vs All 


Normal vs Malignant 


Norma] vs SII and SIU 


210 


1171 


W02823 


217.4+/-87.18 
81.39+/-47.18 
Nl=40, N2=17 
Fold Change: 2.83 
P-value: .00001 


X 


217.4+/-87.18 
53.69+/-25.71 
Nl=40, N2=17 
Fold Change: 3.92 
P-value: .00806 


211 


1173 


W07043 


299.21+/-164.12 
105.66+/-83.76 
Nl=40, N2=17 

Fold ChaDge: 2.82 
P-value: .00008 


X 


299.21+/-164.12 
59.94+/-40.54 
Nl=40, N2=17 
Fold Change: 4.46 
P-value: .01951 


212 


1174 


W07304 


1 139.7 1+/-444.58 
502.93+/-458.99 
Nl=40, N2=17 
Fold Change: 2.64 
P-value: .00012 


X 


1139.71+/-444.58 
349.93+/-213.71 
Nl=40, N2=17 
Fold Change: 3.49 
P-value: .04978 


213 


1180 


W27541 


X 


X 


486.94+/- 189.31 
113.57+/-41.71 
N1=40,N2=17 

Fold Change: 4.17 
P-value: .0025 


214 


1183 


W32480 


720. 17+/-95 1.89 
76.05+/- 158. 18 
Nl=40, N2=17 

Fold Change: 7.94 
P-value: 0 


X 


720.17+/-951.89 
18.91+/-12.14 
Nl=40, N2=17 
Fold Change: 12.97 
P-value: 0 


215 


1184 


W37770 


208.87+/-62 
10S.93+/-55.29 
Nl=40, N2=17 
Fold Change: 2.1 
P-value: .00006 


X 


X 


216 


1185 


W37896 


499.73+/- 192.2 
1636.96+/-1336.48 

Nl=40, N2=17 
Fold Change: 2.49 

P-value: .00074 


X 


X 


217 


1198 


W72338 


464.08+/-121.49 
964.48+A427.69 
Nl=40, N2=17 
Fold Change: 2 
P-value: 0 


X 


X 


218 


1199 


W72347 


368.08+/-157.32 
134.9+/-1 13.13 
Nl=40, N2=17 
Fold Change: 3.01 
P-value: .00008 


X 


X 


219 


1200 


W72407 


234.77+/-159.7 
50.76+/-52.77 
Nl=40, N2=17 
Fold Change: 4.25 
P-value: 0 


X 


234.77+/-159.7 
44.31+/-63.03 
Nl=40, N2=17 
Fold Change: 5.12 
P-value: .03464 


220 


1201 


W72511 


98S.5+/-437.53 
477.34+/-271.59 
Nl=40, N2=17 
Fold Change: 2.11 
P-value: .00006 


X 


X 
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# 


SeqID 


Genbank 


Normal vs All 


Normal vs Malignant 


Normal vs SII and SHI 


221 


1204 


W73386 


248.29+/-403.35 
35.26+A68.97 
Nl=40, N2=17 

Fold Change: 3.18 
P-value: .0001 


469.37+/-905.14 
101.34+/-51.41 
Nl=17, N2=7 

Fold Change: 2.82 
P-value: .01061 


X 


222 


1207 


W73890 


223.01+/-130.77 
84.S2+/-49.32 
Nl=40, N2=17 
Fold Change: 2.49 
P-value: 0 


v 

A 


V 
A 


223 


1246 


Z99386 


611.71+/-209.9I 
288.23+/-106.96 
Nl=40, N2=17 
Fold Change: 2.19 
P-value: .00001 


X 


X 
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1 . A method of diagnosing breast cancer in a patient, comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes from 
5 Tables 1-5; wherein differential expression of the genes in Tables 1 -5 is indicative of breast 
cancer. 

2. A method of detecting the progression of breast cancer in a patient, comprising: 
(a) detecting the level of expression in a tissue sample of two or more genes from 

10 Tables 1-5; wherein differential expression of the genes in Tables 1 -5 is indicative of breast 
cancer progression. 

3. A method of monitoring the treatment of a patient with breast cancer, comprising: 

(a) administering a pharmaceutical composition to the patient; 

(b) preparing a gene expression profile from a cell or tissue sample from the 
patient; and 

(c) comparing the patient gene expression profile to a gene expression from a 
cell population selected from the group consisting of normal breast cells and cancerous 
breast cells. 

4. A method of treating a patient with breast cancer, comprising: 

(a) administering to the patient a pharmaceutical composition, wherein the 
composition alters the expression of at least one gene in Tables 1-5; 

(b) preparing a gene expression profile from a cell or tissue sample from the 
patient comprising tumor cells; and 

(c) comparing the patient expression profile to a gene expression profile selected 
from the group consisting of normal breast cells and cancerous breast cells. 

5 . A method of typing breast cancer in a patient, comprising: 

30 (a) detecting the level of expression in a tissue sample of two or more genes from 

Tables 1-5; wherein differential expression of the genes in Tables 1 -5 is indicative of a type 
of breast cancer selected from a group consisting of infiltrating ductal carcinoma, 
microinvasive carcinoma, cribiform carcinoma, stage I carcinoma, stage II carcinoma, stage 
III carcinoma or lobular carcinoma. 



20 
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6. A method of detecting the presence or progression of infiltrating ductal carcinoma in 
a patient, comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes from 
5 Tables 1-5; wherein differential expression of the genes in Tables 1-5 is indicative of 

infiltrating ductal carcinoma progression. 

7. A method of monitoring the treatment of a patient with infiltrating ductal carcinoma, 
comprising: 

10 (a) administering a pharmaceutical composition to the patient; 

(b) preparing a gene expression profile from a cell or tissue sample from the 
patient; and 

(c) comparing the patient gene expression profile to a gene expression from a 
cell population comprising normal breast cells or to a gene expression profile from a cell 

15 population comprising infiltrating ductal carcinoma cells or to both. 

8. A method of treating a patient with infiltrating ductal carcinoma, comprising: 
(a) administering to the patient a pharmaceutical composition, wherein the 

composition alters the expression of at least one gene in Tables 1-5; 
20 (b) preparing a gene expression profile from a cell or tissue sample from the 

patient comprising infiltrating ductal carcinoma cells; and 

(c) comparing the patient expression profile to a gene expression profile from an 
untreated cell population comprising infiltrating ductal carcinoma cells. 

25 9. A method of diagnosing a microinvasive form of breast tumor in a patient, 
comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes 
from Tables 1-5; wherein differential expression of the genes in Tables 1-5 is indicative of a 
microinvasive form of breast cancer. 

30 

10. A method of detecting the progression of a microinvasive for of breast cancer in a 
patient, comprising: 
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(a) detecting the level of expression in a tissue sample of two or more genes from 
Tables 1-5; wherein differential expression of the genes in Tables 1-5 is indicative of the 
progression of a microinvasive form of breast cancer. 

5 11. A method of monitoring the treatment of a patient with a microinvasive form of 
breast cancer, comprising: 

(a) administering a pharmaceutical composition to the patient; 

(b) preparing a gene expression profile from a cell or tissue sample from the 
patient; and 

10 (c) comparing the patient gene expression profile to a gene expression from a 

cell population comprising normal breast cells or to a gene expression profile from a cell 
population comprising microinvasive breast cancer cells or to both. 

12. A method of treating a patient with a microinvasive form of breast cancer, 
15 comprising: 

(a) administering to the patient a pharmaceutical composition, wherein the 
composition alters the expression of at least one gene in Tables 1-5; 

(b) preparing a gene expression profile from a cell or tissue sample from the 
patient comprising microinvasive breast cancer cells; and 

20 (c) comparing the patient expression profile to a gene expression profile from an 

untreated cell population comprising microinvasive breast cancer cells. 

13. A method of differentiating microinvasive breast cancer from a benign growth in a 
patient, comprising: 

25 (a) detecting the level of expression in a tissue sample of two or more genes from 

Tables 1-5; wherein differential expression of the genes in Tables 1-5 is indicative of 
microinvasive breast cancer rather than benign growth. 

14. A method of screening for an agent capable of modulating the onset or progression 
30 of breast cancer, comprising: 

(a) preparing a first gene expression profile of a cell population comprising 
breast cancer cells, wherein the expression profile determines the expression level of one or 
more genes from Tables 1-5; 

(b) exposing the cell population to the agent; 
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(c) preparing second gene expression profile of the agent-exposed cell 
population; and 

(d) comparing the first and second gene expression profiles. 

5 15. The method of claim 1 4, wherein the breast cancer is a infiltrating ductal carcinoma. 

16. The method of claim 1 4, wherein the breast cancer is a microinvasive breast cancer. 

17. A composition comprising at least two oligonucleotides, wherein each of the 

10 oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 1 -5. 

18. A composition according to claim 17, wherein the composition comprises at least 3 
oligonucleotides. 

15 19. A composition according to claim 17, wherein the composition comprises at least 5 
oligonucleotides. 

20. A composition according to claim 17, wherein the composition comprises at least 7 
oligonucleotides. 

20 

21. A composition according to claim 17, wherein the composition comprises at least 10 
oligonucleotides. 

22. A composition according to any one of claims 17-21, wherein the oligonucleotides 
25 are attached to a solid support. 

23. A composition according to claim 22, wherein the solid support is selected from a 
group consisting of a membrane, a glass support, a filter, a tissue culture dish, a polymeric 
material, a bead and a silica support. 

30 

24. A solid support comprising at least two oligonucleotides, wherein each of the 
oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 1-5. 
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25. A solid support according to claim 24, wherein the oligonucleotides are covalently 
attached to the solid support. 

26. A solid support according to claim 24, wherein the oligonucleotides are non- 
5 covalently attached to the solid support. 

27. A solid support according to claim 24, wherein the support comprises at least about 
10 different oligonucleotides in discrete locations per square centimeter. 

10 28. A solid support according to claim 24, wherein the support comprises at least about 
100 different oligonucleotides in discrete locations per square centimeter. 

29. A solid support according to claim 24, wherein the support comprises at least about 
1000 different oligonucleotides in discrete locations per square centimeter. 

15 

30. A solid support according to claim 24, wherein the support comprises at least about 
10,000 different oligonucleotides in discrete locations per square centimeter. 

31. A computer system comprising: 

20 (a) a database containing information identifying the expression level in breast 

tissue of a set of genes comprising at least two genes in Tables 1-5; and 
(b) a user interface to view the information. 

32. A computer system of claim 3 1 , wherein the database further comprises sequence 
25 information for the genes. 

33. A computer system of claim 31 , wherein the database further comprises information 
identifying the expression level for the genes in normal breast tissue. 

30 34. A computer system of claim 3 1 , wherein the database further comprises information 
identifying the expression level for the genes in breast cancer tissue. 

35. A computer system of claim 34, wherein the breast cancer tissue comprises 
infiltrating ductal carcinoma cells. 
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36. A computer system of claim 34, wherein the breast cancer tissue comprises 
microinvasive breast cancer cells. 

5 37. A computer system of any of claims 31-36, further comprising records including 
descriptive information from an external database, which information correlates said genes 
to records in the external database. 

38. A computer system of claim 37, wherein the external database is GenBank. 

10 

39. A method of using a computer system of any one of claims 31-36 to present 
information identifying the expression level in a tissue or cell of at least one gene in Tables 
1-5, comprising: 

(a) comparing the expression level of at least one gene in Tables 1 -5 in the tissue or 
1 5 cell to the level of expression of the gene in the database. 

40. A method of claim 39, wherein the expression level of at least two genes are 
compared. 

20 41 . A method of claim 39, wherein the expression level of at least five genes are 
compared. 

42. A method of claim 39, wherein the expression level of at least ten genes are 
compared. 

25 

43. A method of claim 39, further comprising displaying the level of expression of at 
least one gene in the tissue or cell sample compared to the expression level in breast cancer. 

44. A kit comprising at least one solid support of any one of claims 24-30 packaged with 
30 gene expression information for said genes. 

45. A kit of claim 44, wherein the gene expression information comprises gene 
expression levels in a breast cancer tissue or cell sample. 
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46. A kit of claim 45, wherein the gene expression information is in an electronic 
format. 
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